Designing a query language

Hello guys,

I’m Ochibobo, I’m new to language engineering/design and I have a question pertaining this designing query languages.

Assuming I’m designing a database system (including the engine, the index and the query language), and I use Flex/Bison or even Antlr to design my query language (parser and tokenizer), but the engine itself is written in another language (not the same one use in designing the query language). Eventually both systems have to communicate, meaning one would have to design a communication interface.

Is there a preferred protocol to achieve this (the communication bit)?

Assuming one opts for something similar to REST, is there a particular preferred format to use to achieve this, like sending a validated parse tree to the engine itself?

I’d appreciate your input on this matter.


You could consider using EMF in combination with XMI or a JSON variant of XMI to serialize models: see also Polyglot Modeling/Metamodeling formats and frameworks

I’m not familiar with protocols that are specific to transporting parse trees: EMF/XMI are generic frameworks capable of transporting ASTs/models.

Thanks @meinte.boersma

Well, the first option I’d consider would be to avoid the problem altogether: ANTLR has targets for many programming languages, so generating a parser directly in the target language would save you the effort.

If that’s not possible, either because you’re using a language that is not supported by ANTLR, or because alongside the parser you want to use libraries (e.g. for building and validating the AST) that are only available in a certain language that you haven’t used in the engine, then it’s necessary to exchange data in some format. There are no standard protocols that I know of; what we do generally is to build an AST from the parse tree (which is closer to the domain, easier to check and to manipulate, and often has a more compact structure which is less onerous to transfer), then serialize it to JSON – either with a custom-built JSON format or with something like emfjackson/ecore.js for EMF – and deserialize it on the other end into a similar AST structure.
We’re building support for this in open-source libraries that we develop – such as Kolasu and @strumenta/ast – but we’re not there yet with a general solution.

Thanks @alessio.stalla