Implementing the langserver protocol for RQL

One of our next project for cubicweb and its ecosystem is to implement the langserver protocol for the RQL language that we are using to query the data stored in CubicWeb. The langserver protocol is an idea to solve one problem: to integrate operation for various languages, most IDE/tools needs to reimplement the wheel all the time, doing custom plugin etc... To solve this issue, this protocol has been invented with one idea: make one server for a language, then all IDE/tools that talks this protocol will be able to integrate it easily.

language server protocol matrice illustration

So the idea is simple: let's build our own server for RQL so we'll be able to integrate it everywhere and build tools for it.

Since RQL has similarities with GraphQL, one of the goals is to have something similar to Graphiql which is for example used by GitHub to expose their API at https://developer.github.com/v4/explorer/

So this post has several objectives:

gather people that would be motivate to work on that subject, for now there is Laurent Wouters and me :)
explain to you in more details (not all) how the language server protocol works
show what is already existing for both langserver in python and rql
show the first roadmap we've discussed with Laurent Wouters on how we think we can do that :)
be a place to discuss this project, things aren't fixed yet :)

So, what is the language server protocol (LSP)?

It's a JSON-RPC based protocol where the IDE/tool talks to the server. JSON-RPC, said simply, is a bi-directional protocol in json.

In this procotol you have 2 kind of exchanges:

requests: where the client (or server) ask the server (or the server ask the client) something and a reply is expected. For example: where is the definition of this function?
notifications: the same but without an expected reply. For example: linting information or error detection

The LSP specifications has 3 bigs categories:

everything about initialization/shutdown the server etc...
everything regarding text and workspace synchronization between the server and the client
the actual things that interest us: a list of languages features that the server supports (you aren't in the obligation to implement everything)

Here is the simplified list of possible languages features that the website present:

Code completion
Hover
Jump to def
Workspace symbols
Find references
Diagnostics

The specification is much more detailed but way less comprehensive (look at the "language features" on the right menu for more details):

completion/completion resolve
hover (when you put your cursor on something)
signatureHelp
declaration (go to...)
definition (go to...)
typeDefinition (go to...)
implementation (go to...)
references
documentHighlight (highlight all references to a symbol)
documentSymbol ("symbol" is a generic term for variable, definitions etc...)
codeAction (this one is interesting)
codeLens/codeLens resolve
documentLink/documentLink resolve
documentColor/colorPresentation (stuff about picking colors)
formatting/rangeFormatting/onTypeFormatting (set tab vs space)
rename/prepareRename
foldingRange

(Comments are from my current understanding of the spec, it might not be perfect)

The one that is really interesting here (but not our priority right now) is "codeAction", it's basically a generic entry point for every refactoring kind of operations as some examples from the spec shows:

Example extract actions:

Extract method

Extract function

Extract variable

Extract interface from class

Example inline actions:

Inline function

Inline variable

Inline constant

Example rewrite actions:

Convert JavaScript function to class

Add or remove parameter

Encapsulate field

Make method static

Move method to base class

But I'm not expecting us to have direct need for it but that really seems one to keep in mind.

One question that I frequently got was: is syntax highlight included in the langserver protocol? Having double checked with Laurent Wouters, it's actually not the case (I thought documentSymbol could be used for that but actually no).

But we already have an implementation for that in pygments: https://hg.logilab.org/master/rql/file/d30c34a04ebf/rql/pygments_ext.py

What is currently existing for LSP in python and rql

The state is not great in the python ecosystem but not a disaster. Right now I haven't been able to find any generic python implementation of LSP that we could really reuse and integrate.

There is, right now and to my knowledge, only 2 maintained implementation of LSP in python. One for python and one for ... Fortran x)

Palantir's one makes extensive use of advanced magic code doesn't seems really necessary but it is probably of higher quality code since the Fortran one doesn't seems very idiomatic but looks much simpler.

So we'll ever need to extract the needed code from one of those of implement our own, not so great.

On the RQL side, everything that seems to be useful for our current situation is located in the RQL package that we maintain: https://hg.logilab.org/master/rql

Roadmap

After a discussion with Laurent Wouters, a first roadmap looks like this:

extract the code from either palantir or fortran LSP implementation and come with a generic implementation (I'm probably going to do it but Laurent told me he his going to take a look too) When I'm talking about a generic implementation I'm talking about everything listed in the big category of the protocol that isn't related to language features which we don't really want to rewrite again.

Once that's done, start implementing the language features for RQL:

the easiest is the syntax errors detection code, we just need to launch to parser on the code and handle the potential errors
do that with pretty specific red underline
play with RQL AST to extract the symbols and start doing things like codeLens and hover
much more complex (and for later): autocompletion (we'll either need a demi compiler or to modify the current one for that)

Side note

To better understand the motivation behind this move, it is part of the more global move of drop the "Web" from CubicWeb and replace all the front end current implementation by reactjs+typescript views. In this context CubicWeb (or Cubic?) will only serves as a backend provide with which we will talk in... RQL! Therefor writing and using RQL will be much more important than right now.