Hi, everyone!
First of all, I’d like to thank @horacio.hoyos for bringing this up. Second, @meinte.boersma and @juha-pekka for your insights. Third, I also agree with “it depends…”. I haven’t yet read the paper shared by @juha-pekka, but what I usually do, and this comes from my experience with SableCC, and lately treesitter has provided me a different approach to TDD. Although treesitter is mainly focused on editors, it gives us an idea of how one can do TDD with parser generators.
With SableCC, as it is a framework that generates code for the lexical analyzer, the parser and the CST (by default) or AST (customized), and the corresponding visitors, I usually write the tests for the Lexical Analyzer, and the parser is only based on the AST. I write some test code and then build (by hand) the expected AST and test the AST generated against the expected one using object comparison (in Java, Python and Go).
As for treesitter, it generates S-Expressions, and one writes the tests in text files. The expected AST in S-Expression format, which sort of makes it easy for one to write the tests.
I have only been working with tree-sitter for a week and a half, and there is one important lesson I took from it from a TDD point of view. Develop the grammar in increments, either top-down, or bottom up. I think either approach will work, but I guess that it also depends on personal experience and preference.
As an example, let’s look at an extract from a Pascal grammar:
program = program_heading declarations body "." .
program_heading = "program" identifier ";" .
Now, the incremental approach I was referring to would start as follows:
program = program_heading body "." .
program_heading = "program" identifier ";" .
body = "begin" "end" .
From this, I would write the test like this:
=================
Simplest Program
=================
program test;
begin
end.
===
(program
(program_heading
(identifier)))
After this, I move on to implement the grammar in treesitter, and make sure that the test passes. Once this is done, I expand the grammar, write a new test for it, implement in treesitter, test, refactor, etc…
I find this to work well with Treesitter, and I guess it should also work with other automated tools. However, I’d like to hear from other approaches you may have, particularly for recursive descent parsers.
Do you have other approaches you can share?
Once again, thank you all for sharing your insights.
Regards,
Fidel H Viegas