Functional/Combinator/Monadic Parsing contrasted with "traditional" parser generators?

tarilabs · February 5, 2020, 8:16pm

Hi all,
I have a very general question looking feedbacks as per $subject, and I thought about asking here

This is fresh off the presses of YouTube today: https://youtu.be/dDtZLm7HIJs

I find very interesting, especially the fact the grammar rules looks very adherent to the functional implementation.
Personally, I have just limited experience with “traditional” parser generators libraries such as Antlr.
I was wondering how is in general a feasible approach also to implement error-handling, heuristics, etc?

For instance, at the end of the video, an incorrect program is simply not parsed (an empty list is returned). If there are people with experience with this approach, what would be a general comment about this way of creating parsers?
In Antlr error reporting is provided ootb, while Antlr comes with other limitations, for instance if I recall correctly only support self-recursive and not mutually-left-recursive, etc. while looking at this video it seems the Combinator-Parsing approach is not limited in this way.

If somebody have some experience and/or feedback to share, I would appreciate. Thanks!

digital-ember · February 6, 2020, 7:10am

Hello. I’m certainly not an expert on the topic, but maybe you find some answers looking at the Elm Parser library, which focuses on “excellent error messages”: https://package.elm-lang.org/packages/elm/parser/latest

Especially in the “Advanced” module they have nice ideas how to make rich error messages possible by tracking context.

igor.dejanovic · February 6, 2020, 11:23am

Hi,

I’m also not an expert on the topic but have some experience in building parser libraries.

There are a lot of different options and consideration when choosing the right parsing algorithm so it is hard to explain all the pros and cons of different approaches in a single post but one thing seems certain “there is no silver bullet!”.

A parser combinators approach is a top-down approach which is appealing as it is easy to understand, debug and reason about. It is probably the way to go if you are building a parser from scratch. But, as all top-down approaches it inherently suffers from being unable to support left recursive rules. You can think of this approach as recursive descent and a good schematic description would be PEG grammars as the choice operator is always ordered (you are trying alternatives in the order they are referenced). These style of parsing can support a nice error reporting. E.g. I have implemented Arpeggio library which uses a variation of this approach and it reports errors quite well, but the problematic part is error recovery as the part of the parser state is on the stack (as you are using recursive function calls) so you can’t manipulate the stack easily to put the parser in an arbitrary state during recovery.

Parsing approaches with explicit state handling (e.g. LL/LR and its variations) can support error recovery easily (and thus can report multiple errors in one go).

Another difference worth mentioning is that bottom up parsing approaches allow left-recursive rules so the grammars tend to be more natural. E.g. it was always much easier for me to read:

expr = expr + expr | expr * expr | ( expr ) | int

than:

expr = term + expr | term
term = factor * term | factor
term = ( expr ) | int

First grammar can’t be implemented with a naive recursive descent parsing approach. Another problem with the first grammar is that it doesn’t encode priority so the language is ambiguous (is 2+3*5 equal 17 or 25?). So either you will use this kinda cryptic style of encoding priorities or your parser generator can use an extended version of BNF spec to allow these information specified declaratively, like in
parglare:

expr: expr + expr {left, 1} | expr * expr {left, 2} | ( expr ) | int

where associativity and priority is declared inside {}.

I tend to prefer declarative approach as I think a good DSL (which a grammar definition language is!) should take away thinking about the inner workings of the interpreter/compiler. User should be free to specify the solution without these constraints.

Hope this helps a bit. I would be glad to extend the discussion in a direction that interest you the most but as I said I’m not an expert and have been exposed to only some of the parsing algorithm in a greater extent so all this should be taken with a grain of salt.

a_bakic · June 5, 2020, 8:14pm

A new PEG parsing technique: https://arxiv.org/abs/2005.06444 (compared to an older one: http://www.vpri.org/pdf/tr2007002_packrat.pdf).

alessio.stalla · June 10, 2020, 4:17pm

For the “parser state is on the stack” issue, have you ever thought or experimented with continuation-passing style or CPS? My understanding of it is that the call stack is no longer a stack but a tree (or graph) and it can be navigated by saving a continuation (reified as a function/closure) and calling back into it (akin to returning at some point back into the calling stack, but without necessarily losing the rest of the stack). There appear to be implementations in Python, for example, I stumbled upon http://baruchel.github.io/blog/python/2015/07/10/continuation-passing-style-in-python/

igor.dejanovic · June 22, 2020, 8:10pm

Looks like an interesting approach worth exploring. Never thought about it before. My first impression is that, although doable it would still be more complex that a plain recursive-descent which would make it less appealing. But, worth experimenting with it. Thanks!

pastoreerrante · June 25, 2020, 12:39pm

Two resources I found useful on this topic:

and

https://www.youtube.com/playlist?list=PLP29wDx6QmW5yfO1LAgO8kU3aQEj8SIrU