Which parser generator are you using (if any)?

My favorite tool is GOLD Parser; I really like the separation between the engine and the parser generator, which allows end users to create their custom engines targeting whichever language they like. Since GOLD only deals with generating the parse tables, and doesn’t provide language specific functions to enhance the grammars, both GOLD and its grammars are 100% language agnostic (in fact, many custom engines targeting different languages were created by third parties during the years).

Unfortunately, the project came to an abrupt stop all of a sudden, many years ago, and was never fully ultimated — only the Windows version is available, the source code was never released, and the pseudo code example on how to write an engine was never written. I’ve attempted to contact its author (Devin Cook) a couple of times, but to no avail.

The fact that it was never ported to other OSs is the main limitation of GOLD (although, obviously, the generated parser can be used on any platform).

I’d really like to see the GOLD project revived. Many GOLD users have now migrated to ANTLR, but reported they would rather use GOLD if the project was revived.

The second parser generator on my list is Lemon, which is a C parser generator but was also adapted to generate parsers in other languages.

If you had not come through this, this is an open source JavaParser in antlr4. You can find other parser too https://github.com/antlr/grammars-v4/tree/master/java

When I first dipped my toe into parsing I wrote my own, taking a tutorial parser written in PASCAL and re-implementing it in C#. I can no longer find the tutorial (too many laptops later and poor note-taking on my part.)

The parser didn’t do a whole lot of anything, other than familiarize me with the concepts.

Subsequently…ANTLR. 'Cause it…as that horrible phrase goes…“just works.” Or just works well enough to do the job(s) I’ve needed it to do. I’ve targeted Python and C# (like the latter because it’s fast and rigorous, like the former because it’s fast to develop, and because you can add elements to a class by…just adding them as needed.) Java would be a good target, except my Java is too far back in my history to be productive anymore.

Once I got my head around ANTLR’s Visitor class, allowing me to build the ASTs I needed, inertia has ensured that I’m in the ANTLR camp for a while yet.

1 Like

I am using a hand-crafted tokenizer, which is just a big finite state machine that eats one character at a time, and generates tokens. The tokens are then parsed using a Pratt parser. I started with the classic Wirth recursiver-descent parser, but it is extremely inefficient on grammars that go in multiple directions, and the Pratt parser is a miracle of simplicity and only about 10 lines long.

Antlr takes a lot of effort to learn, and is not amenable to indent specific languages in my experiments, and given how simple pratt parsing is, not worth the effort to learn IMHO. There are a bunch of lectures on YouTube by Crockford, and others, about how to use the Pratt parser.

I was very disappointed to buy a $70 book on Parsing from Springer Verlag, only to find that it doesn’t even mention the Pratt method. I think the academics enjoy doing things the hard way, and have ignored the drastic simplification that is available.

2 Likes

@CodingFiend’s reply popped this thread up so I saw it. I used to roll things by hand, just code. Then tried VisualParse++ (many years ago, if anyone remembers it), and in 2001 I found Rebol, which has a PEG parsing DSL built into it. I now work on a Rebol descendent called Red, which carries that feature forward. It’s still just a basic engine, without the high level features ANTLR and other tools offer, but we have plans for things in that area. One very nice feature is that because Red is a data language, you can parse either at the character level or at the value level. That is, let Red’s tokenizer load values and then match them by datatype, still using PEG rules. For example, given the inputs:

send gregg@red-lang.org USD$100 on 15-Sep-2021 at 12:34
send USD$100 to gregg@red-lang.org on 15-Sep-2021

You could parse them both with this rule

['send [email! money! | money! 'to email!] 'on date! opt ['at time!]]

And also include actions to extract the actual data.

1 Like

I use menhir (LR yacc-style code generator), and tree-sitter from github (GLR parser generator).