LSP介绍并实现语言服务

LSP (Language Server Protocol) 介绍

前段时间我为Jimmer DTO实现了一个 LSP 的语言服务,这是我第一次实现 LSP,所以在这里我分享一下我实现LSP的经验。

首先来看一下效果,图片太多,我就放一部分,更多的可以看jimmer-dto-lsp

属性提示

结构

触摸

高亮

LSP 是一种协议,用于在 IDE 和语言服务器之间通信。IDE 通过 LSP 请求语言服务器提供代码分析服务,语言服务器通过 LSP 响应 IDE 的请求。在没有 LSP 之前,每个 IDE 都需要为每种语言实现一套代码分析服务,而 LSP 的出现使得 IDE 只需要实现一套 LSP 协议,就可以使用任何支持 LSP 的语言服务器。所以就大大降低了 IDE 的开发成本。

列如,需要从一个地方跳转到其他地方,IDE 会发送一个请求,位置是第 3 行第 12

{
  "jsonrpc": "2.0",
  "id": 1,
  "method": "textDocument/definition",
  "params": {
    "textDocument": {
      "uri": "file:///p%3A/mseng/VSCode/Playgrounds/cpp/use.cpp"
    },
    "position": {
      "line": 3,
      "character": 12
    }
  }
}

之后服务端会返回一个响应,位置是第 0 行第 4 列到第 0 行第 11 列,这样 IDE 就可以跳转到这个位置

{
  "jsonrpc": "2.0",
  "id": 1,
  "result": {
    "uri": "file:///p%3A/mseng/VSCode/Playgrounds/cpp/provide.cpp",
    "range": {
      "start": {
        "line": 0,
        "character": 4
      },
      "end": {
        "line": 0,
        "character": 11
      }
    }
  }
}

实现

上面的例子中是使用纯文本实现的,我们可以直接使用封装好的库,比如lsp4j。由于只是简单的教学,我这里只实现代码的高亮,语言是JSON5,词法分析就使用antlr4

首先我们需要创建一个Gradle项目,下面是我们项目中需要的所有依赖和插件。

[versions]
kotlin = "2.1.0"
antlr = "4.13.0"
lsp4j = "0.23.1"
shadow = "9.0.0-beta4"
[libraries]
antlr = { group = "org.antlr", name = "antlr4", version.ref = "antlr" }
lsp4j = { module = "org.eclipse.lsp4j:org.eclipse.lsp4j", version.ref = "lsp4j" }
[plugins]
kotlin-jvm = { id = "org.jetbrains.kotlin.jvm", version.ref = "kotlin" }
shadow = { id = "com.gradleup.shadow", version.ref = "shadow" }

接着创建一个叫langauge的子项目,并在src\main\antlr\cn\enaium\j5下创建一个J5.g4文件。

import org.gradle.kotlin.dsl.withType
import org.jetbrains.kotlin.gradle.tasks.KotlinCompile

plugins {
    alias(libs.plugins.kotlin.jvm)
    antlr
}

repositories {
    mavenCentral()
}

dependencies {
    antlr(libs.antlr)
    testImplementation(kotlin("test"))
}

tasks.test {
    useJUnitPlatform()
}

tasks.withType<Jar>().configureEach {
    dependsOn(tasks.withType<AntlrTask>())
}

tasks.withType<KotlinCompile>().configureEach {
    dependsOn(tasks.withType<AntlrTask>())
}

grammars-v4中找到JSON5g4文件,之后将grammar JSON5;改为grammar J5;,将单行注释和多行注释的 -> skip给去掉。

// Student Main
// 2020-07-22
// Public domain

// JSON5 is a superset of JSON, it included some feature from ES5.1
// See https://json5.org/
// Derived from ../json/JSON.g4 which original derived from http://json.org

// $antlr-format alignTrailingComments true, columnLimit 150, minEmptyLines 1, maxEmptyLinesToKeep 1, reflowComments false, useTab false
// $antlr-format allowShortRulesOnASingleLine false, allowShortBlocksOnASingleLine true, alignSemicolons hanging, alignColons hanging

grammar J5;

json5
    : value? EOF
    ;

obj
    : '{' pair (',' pair)* ','? '}'
    | '{' '}'
    ;

pair
    : key ':' value
    ;

key
    : STRING
    | IDENTIFIER
    | LITERAL
    | NUMERIC_LITERAL
    ;

value
    : STRING
    | number
    | obj
    | arr
    | LITERAL
    ;

arr
    : '[' value (',' value)* ','? ']'
    | '[' ']'
    ;

number
    : SYMBOL? (NUMERIC_LITERAL | NUMBER)
    ;

// Lexer

SINGLE_LINE_COMMENT
    : '//' .*? (NEWLINE | EOF)
    ;

MULTI_LINE_COMMENT
    : '/*' .*? '*/'
    ;

LITERAL
    : 'true'
    | 'false'
    | 'null'
    ;

STRING
    : '"' DOUBLE_QUOTE_CHAR* '"'
    | '\'' SINGLE_QUOTE_CHAR* '\''
    ;

fragment DOUBLE_QUOTE_CHAR
    : ~["\\\r\n]
    | ESCAPE_SEQUENCE
    ;

fragment SINGLE_QUOTE_CHAR
    : ~['\\\r\n]
    | ESCAPE_SEQUENCE
    ;

fragment ESCAPE_SEQUENCE
    : '\\' (
        NEWLINE
        | UNICODE_SEQUENCE       // \u1234
        | ['"\\/bfnrtv]          // single escape char
        | ~['"\\bfnrtv0-9xu\r\n] // non escape char
        | '0'                    // \0
        | 'x' HEX HEX            // \x3a
    )
    ;

NUMBER
    : INT ('.' [0-9]*)? EXP? // +1.e2, 1234, 1234.5
    | '.' [0-9]+ EXP?        // -.2e3
    | '0' [xX] HEX+          // 0x12345678
    ;

NUMERIC_LITERAL
    : 'Infinity'
    | 'NaN'
    ;

SYMBOL
    : '+'
    | '-'
    ;

fragment HEX
    : [0-9a-fA-F]
    ;

fragment INT
    : '0'
    | [1-9] [0-9]*
    ;

fragment EXP
    : [Ee] SYMBOL? [0-9]*
    ;

IDENTIFIER
    : IDENTIFIER_START IDENTIFIER_PART*
    ;

fragment IDENTIFIER_START
    : [\p{L}]
    | '$'
    | '_'
    | '\\' UNICODE_SEQUENCE
    ;

fragment IDENTIFIER_PART
    : IDENTIFIER_START
    | [\p{M}]
    | [\p{N}]
    | [\p{Pc}]
    | '\u200C'
    | '\u200D'
    ;

fragment UNICODE_SEQUENCE
    : 'u' HEX HEX HEX HEX
    ;

fragment NEWLINE
    : '\r\n'
    | [\r\n\u2028\u2029]
    ;

WS
    : [ \t\n\r\u00A0\uFEFF\u2003]+ -> skip
    ;

之后编译项目就会生成J5LexerJ5Parser

接着创建一个server项目用于实现我们的语言服务。

plugins {
    alias(libs.plugins.kotlin.jvm)
    alias(libs.plugins.shadow)
}

repositories {
    mavenCentral()
}

dependencies {
    implementation(project(":language"))
    implementation(libs.lsp4j)
    testImplementation(kotlin("test"))
}

tasks.test {
    useJUnitPlatform()
}

tasks.jar {
    dependsOn(tasks.shadowJar)
}

首先我们需要实现一个LanguageServer接口。

package cn.enaium.j5.lsp

import org.eclipse.lsp4j.InitializeParams
import org.eclipse.lsp4j.InitializeResult
import org.eclipse.lsp4j.services.LanguageServer
import org.eclipse.lsp4j.services.TextDocumentService
import org.eclipse.lsp4j.services.WorkspaceService
import java.util.concurrent.CompletableFuture

/**
 * @author Enaium
 */
class J5LanguageServer : LanguageServer {
    override fun initialize(params: InitializeParams): CompletableFuture<InitializeResult> {
        TODO("Not yet implemented")
    }

    override fun shutdown(): CompletableFuture<in Any> {
        TODO("Not yet implemented")
    }

    override fun exit() {
        TODO("Not yet implemented")
    }

    override fun getTextDocumentService(): TextDocumentService {
        TODO("Not yet implemented")
    }

    override fun getWorkspaceService(): WorkspaceService {
        TODO("Not yet implemented")
    }
}

接着依次实现TextDocumentServiceWorkspaceService

package cn.enaium.j5.lsp

import org.eclipse.lsp4j.DidChangeTextDocumentParams
import org.eclipse.lsp4j.DidCloseTextDocumentParams
import org.eclipse.lsp4j.DidOpenTextDocumentParams
import org.eclipse.lsp4j.DidSaveTextDocumentParams
import org.eclipse.lsp4j.services.TextDocumentService

/**
 * @author Enaium
 */
class J5TextDocumentService : TextDocumentService {
    override fun didOpen(params: DidOpenTextDocumentParams) {
        TODO("Not yet implemented")
    }

    override fun didChange(params: DidChangeTextDocumentParams) {
        TODO("Not yet implemented")
    }

    override fun didClose(params: DidCloseTextDocumentParams) {
        TODO("Not yet implemented")
    }

    override fun didSave(params: DidSaveTextDocumentParams) {
        TODO("Not yet implemented")
    }
}
package cn.enaium.j5.lsp

import org.eclipse.lsp4j.DidChangeConfigurationParams
import org.eclipse.lsp4j.DidChangeWatchedFilesParams
import org.eclipse.lsp4j.services.WorkspaceService

/**
 * @author Enaium
 */
class J5WorkspaceService : WorkspaceService {
    override fun didChangeConfiguration(params: DidChangeConfigurationParams) {

    }

    override fun didChangeWatchedFiles(params: DidChangeWatchedFilesParams) {

    }
}

实现initialize方法,这个方法主要是需要返回我们这个语言服务器为支持什么功能。

override fun initialize(params: InitializeParams): CompletableFuture<InitializeResult> {
    return CompletableFuture.completedFuture(InitializeResult(ServerCapabilities().apply {
        setTextDocumentSync(TextDocumentSyncOptions().apply {
            openClose = true
            change = TextDocumentSyncKind.Full
            setSave(SaveOptions().apply {
                includeText = true
            })
        })
        semanticTokensProvider = SemanticTokensWithRegistrationOptions().apply {
            legend = SemanticTokensLegend().apply {
                tokenTypes = SemanticType.entries.map { it.type }
            }
            setFull(true)
        }
    }))
}

首先任何一个语言服务都需要具备这个文档同步功能,这个功能会在打开关闭修改和保存文件是触发。之后是提供语义,提供语义之后,IDE就可以根据这个语义来实现代码高亮。

我们需要定义一个SemanticType枚举类。

enum class SemanticType(val id: Int, val type: String) {
    COMMENT(0, "comment"),
    KEYWORD(1, "keyword"),
    FUNCTION(2, "function"),
    STRING(3, "string"),
    NUMBER(4, "number"),
    DECORATOR(5, "decorator"),
    MACRO(6, "macro"),
    TYPE(7, "type"),
    TYPE_PARAMETER(8, "typeParameter"),
    CLASS(9, "class"),
    VARIABLE(10, "variable"),
    PROPERTY(11, "property"),
    STRUCT(12, "struct"),
    INTERFACE(13, "interface"),
    PARAMETER(14, "parameter"),
    ENUM_MEMBER(15, "enumMember"),
    NAMESPACE(16, "namespace"),
}

之后实现一下剩余的方法。

override fun shutdown(): CompletableFuture<Any> {
    return CompletableFuture.completedFuture(true)
}
override fun exit() {
}
override fun getTextDocumentService(): TextDocumentService
    return J5TextDocumentService()
}
override fun getWorkspaceService(): WorkspaceService {
    return J5WorkspaceService()
}

然后实现代码同步功能。

val cache = mutableMapOf<String, String>()

override fun didOpen(params: DidOpenTextDocumentParams) {
    cache[params.textDocument.uri] = params.textDocument.text
}
override fun didChange(params: DidChangeTextDocumentParams) {
    cache[params.textDocument.uri] = params.contentChanges[0].text
}
override fun didClose(params: DidCloseTextDocumentParams) {
    cache.remove(params.textDocument.uri)
}
override fun didSave(params: DidSaveTextDocumentParams) {
    cache[params.textDocument.uri] = params.text
}

接着我们需要再在J5TextDocumentService的实现类中实现一个semanticTokensFull方法。

override fun semanticTokensFull(params: SemanticTokensParams): CompletableFuture<SemanticTokens> {
    val document = cache[params.textDocument.uri] ?: return CompletableFuture.completedFuture(SemanticTokens())
    val data = mutableListOf<Int>()
    var previousLine = 0
    var previousChar = 0
    val j5Lexer = J5Lexer(CharStreams.fromString(document))
    val token = CommonTokenStream(j5Lexer)
    token.fill()
    token.tokens.forEach { token ->
        val semanticType = when (token.type) {
            J5Lexer.STRING -> SemanticType.STRING
            J5Lexer.NUMBER -> SemanticType.NUMBER
            J5Lexer.NUMERIC_LITERAL -> SemanticType.NUMBER
            J5Lexer.LITERAL -> SemanticType.KEYWORD
            J5Lexer.SINGLE_LINE_COMMENT -> SemanticType.COMMENT
            J5Lexer.MULTI_LINE_COMMENT -> SemanticType.COMMENT
            J5Lexer.IDENTIFIER -> SemanticType.VARIABLE
            J5Lexer.SYMBOL -> SemanticType.KEYWORD
            else -> return@forEach
        }
        token.text.split("\n").forEachIndexed { index, s ->
            val start = Position(token.line - 1, token.charPositionInLine)
            val currentLine = start.line + index
            val currentChar = if (index == 0) start.character else 0
            data.add(currentLine - previousLine)
            data.add(if (previousLine == currentLine) currentChar - previousChar else currentChar)
            data.add(s.length)
            data.add(semanticType.id)
            data.add(0)
            previousLine = currentLine
            previousChar = currentChar
        }
    }
    return CompletableFuture.completedFuture(SemanticTokens(data))
}

最后我们需要创建一个主方法来启动我们的语言服务。

fun main() {
    val server = J5LanguageServer()
    val launcher = Launcher.createLauncher(server, LanguageClient::class.java, System.`in`, System.out)
    launcher.startListening()
}

测试

新建一个后缀为j5的文件,然后输入以下内容。

{
  /* play with comments
  {  true, NaN   ] , {}* / aaa{}


  // make sure we included all \p{L},
  yes, json5, and ECMAScript 5+ supports them
//*/
  全世界无产者: "联合起来",
  n1: 1e2,
  n2: 0.2e-4,
  // May not works in some poor IDE
  // but works in official parser
  Infinity: -Infinity,
  NaN: -NaN,
  true: true,
  false: false,
  // yes, it works in their parser too
  一: "Unicode!"
}

// comment ends with eof

之后我这里使用neovim来测试,确保你已经安装了lspconfig

· 在init.lua中添加以下内容。

vim.cmd [[au BufRead,BufNewFile *.j5                set filetype=J5]]

local lsp = require('lspconfig')
local lsp_config = require('lspconfig.configs')

lsp_config.j5 = {
    default_config = {
        cmd = { 'java', '-cp', 'D:/Projects/teaching-lsp/server/build/libs/server-1.0-SNAPSHOT-all.jar', 'cn.enaium.j5.lsp.MainKt' },
        filetypes = { 'J5' },
	root_dir = function(fname)
            return lsp.util.root_pattern('*.j5')(fname)
        end,
    }
}

lsp_config.j5.setup {}

neovim

源码