Hi,
I’m also not an expert on the topic but have some experience in building parser libraries.
There are a lot of different options and consideration when choosing the right parsing algorithm so it is hard to explain all the pros and cons of different approaches in a single post but one thing seems certain “there is no silver bullet!”.
A parser combinators approach is a top-down approach which is appealing as it is easy to understand, debug and reason about. It is probably the way to go if you are building a parser from scratch. But, as all top-down approaches it inherently suffers from being unable to support left recursive rules. You can think of this approach as recursive descent and a good schematic description would be PEG grammars as the choice operator is always ordered (you are trying alternatives in the order they are referenced). These style of parsing can support a nice error reporting. E.g. I have implemented Arpeggio library which uses a variation of this approach and it reports errors quite well, but the problematic part is error recovery as the part of the parser state is on the stack (as you are using recursive function calls) so you can’t manipulate the stack easily to put the parser in an arbitrary state during recovery.
Parsing approaches with explicit state handling (e.g. LL/LR and its variations) can support error recovery easily (and thus can report multiple errors in one go).
Another difference worth mentioning is that bottom up parsing approaches allow left-recursive rules so the grammars tend to be more natural. E.g. it was always much easier for me to read:
expr = expr + expr | expr * expr | ( expr ) | int
than:
expr = term + expr | term
term = factor * term | factor
term = ( expr ) | int
First grammar can’t be implemented with a naive recursive descent parsing approach. Another problem with the first grammar is that it doesn’t encode priority so the language is ambiguous (is 2+3*5
equal 17 or 25?). So either you will use this kinda cryptic style of encoding priorities or your parser generator can use an extended version of BNF spec to allow these information specified declaratively, like in
parglare:
expr: expr + expr {left, 1} | expr * expr {left, 2} | ( expr ) | int
where associativity and priority is declared inside {}
.
I tend to prefer declarative approach as I think a good DSL (which a grammar definition language is!) should take away thinking about the inner workings of the interpreter/compiler. User should be free to specify the solution without these constraints.
Hope this helps a bit. I would be glad to extend the discussion in a direction that interest you the most but as I said I’m not an expert and have been exposed to only some of the parsing algorithm in a greater extent so all this should be taken with a grain of salt.