How to test a DSL?

horacio.hoyos · November 23, 2022, 8:05pm

Hi all,

I am developing a DSL and wanted to write some tests for the parser, but I am not sure how many or what is “enough” to test. ATM I have a couple of tests that parse small snippets and just check that there are no parse errors. I am starting to write another set of tests that parse a file and then finds specific instances in the AST and checks that the properties are being correctly set according to the CS.

Is this a good approach? And more importantly how complete should the tests be? Should I have a test file that exercises all the language constructs?

Any pointers are welcome.

meinte.boersma · November 24, 2022, 8:36pm

“It depends…” is the only right answer.

What you’re already doing now sounds good for testing the parsing part of a DSL. There are more aspects to a DSL though, such as: constraints (with underlying type system, possibly), generation/interpretation (“semantics”), editor support (content assist), etc.

In general, a couple of well-crafted “end-to-end”-like tests work wonders to be alerted to any problem introduced. Especially when generating code from DSL content, there’s a “fan out”-effect so that any problem with the surface area of the DSL (the parser, in your case) will be almost guaranteed to show itself in some form in the generated code.

juha-pekka · November 28, 2022, 12:17pm

Agree with “it depends…” as the type of language and expected usage influences a lot. We have got good results in several cases when applying test-driven approach as detailed e.g. in DSM workshop demo paper.

horacio.hoyos · November 28, 2022, 9:09pm

Thanks both for the comments! I guess I will continue with my current approach and take notes form the paper you shared.

fidelhviegas · December 2, 2022, 11:11am

Thanks for sharing your paper, @juha-pekka !

fidelhviegas · December 2, 2022, 11:40am

Hi, everyone!

First of all, I’d like to thank @horacio.hoyos for bringing this up. Second, @meinte.boersma and @juha-pekka for your insights. Third, I also agree with “it depends…”. I haven’t yet read the paper shared by @juha-pekka, but what I usually do, and this comes from my experience with SableCC, and lately treesitter has provided me a different approach to TDD. Although treesitter is mainly focused on editors, it gives us an idea of how one can do TDD with parser generators.

With SableCC, as it is a framework that generates code for the lexical analyzer, the parser and the CST (by default) or AST (customized), and the corresponding visitors, I usually write the tests for the Lexical Analyzer, and the parser is only based on the AST. I write some test code and then build (by hand) the expected AST and test the AST generated against the expected one using object comparison (in Java, Python and Go).

As for treesitter, it generates S-Expressions, and one writes the tests in text files. The expected AST in S-Expression format, which sort of makes it easy for one to write the tests.

I have only been working with tree-sitter for a week and a half, and there is one important lesson I took from it from a TDD point of view. Develop the grammar in increments, either top-down, or bottom up. I think either approach will work, but I guess that it also depends on personal experience and preference.

As an example, let’s look at an extract from a Pascal grammar:

program = program_heading declarations body "." .
program_heading = "program" identifier ";" .

Now, the incremental approach I was referring to would start as follows:

program = program_heading body "." .
program_heading = "program" identifier ";" .
body = "begin" "end" .

From this, I would write the test like this:

=================
Simplest Program
=================
program test;
begin
end.
===
(program
    (program_heading
        (identifier)))

After this, I move on to implement the grammar in treesitter, and make sure that the test passes. Once this is done, I expand the grammar, write a new test for it, implement in treesitter, test, refactor, etc…

I find this to work well with Treesitter, and I guess it should also work with other automated tools. However, I’d like to hear from other approaches you may have, particularly for recursive descent parsers.

Do you have other approaches you can share?

Once again, thank you all for sharing your insights.

Regards,

Fidel H Viegas

horacio.hoyos · December 5, 2022, 4:30pm

I think the “building the AST by hand and compare”, is somewhat similar to what Federico proposed on Building & Testing a Parser, with the difference that he uses a generator to build the expected result.

I wonder if using automatic generation of test cases via CST (automated) generation and/or mutation testing is really needed for a DSL.