Virtual Meetup - Generating compilers purely from context-free grammars

elisa.costantini · October 13, 2021, 11:38am

Hi Community,

I am happy to announce that this Thursday (the 14th of October), Denis Kuniß will hold a discussion about “Generating compilers purely from context-free grammars”.

Also, remember that we changed the link to join the Meetup!

Generating compilers, and not only parsers, from declarative specifications and without implementing them is possible! This talk will introduce the calculus of Extended Affix Grammars, a specific form of two-level grammars, for specifying compilers purely by declarative means. Based on this calculus the compiler generator implementation Gamma, initially developed at the TU Berlin, is shown. We will step through a real context-sensitive small example and generate a running compiler from it. By means of another example, it is shown how semantic context checks can be specified using the introduced calculus. The current implementation is available on GitHub and the hands-on sessions will show it running at a browser-based Gitpod environment in VS code with grammar editor support, accessible to anyone just on a mouse click.

Denis has a Computer Science degree from TU Berlin. At his diploma thesis, he implemented a part of the shown Extended Affix Grammar compiler generator. As a student freelancer, he built and maintained legacy language analysis tools using the compiler construction toolbox Cocktail helping to solve the year-2000-problem. Today he works as a software architect at Diebold Nixdorf and at OMG standardizing APIs for retail peripherals.
He is an enthusiastic user of the Xtend language and the Xtext framework.
He is convinced of the advantages of the flow design approach and the flow programming model and has written several articles in regard to that in different German and online CS magazines.

And if you are thinking of proposing a talk, it is time to come forward. Just let me know by replying to this message.

How to connect

To avoid other security issues is now necessary to register for the meeting. The registration should be necessary just once and be valid for all the next meetings you will participate in. I understand it is a little extra effort, but it would avoid problems like the ones we encountered:

Registration for the Virtual Meetup

After registering, you will receive a confirmation email containing information about joining the meeting. It will also permit you to add it to your calendar.

Time

It is hosted on Zoom at 6 PM GMT+1/CEST (you can use this link to figure out which time is in your timezone: Dateful Time Zone Converter).

Cheers,
Elisa

P.S. We get a recurring question: “Are presentations recorded?”. The answer is not, and the reasons are explained here On recording Virtual Meetups - #7 by voelter

Niko · October 14, 2021, 5:27pm

@kuniss Some more context to the DSL idea I mentioned at the end of your great presentation:

The current language (aka grammar) is very close to the theory. This allows a solid implementation. However, it’s quite hard to read, and I guess also to write: from browsing through the examples, simple things seem to be quite elaborate. Case in point: instead of A-Z we have to write A | B | C ....

So the DSL on top of this grammar would allow for higher abstractions, e.g. the aforementioned shorthand, or things you need all the time like symbol tables, nested scopes, etc.

kuniss · October 14, 2021, 9:35pm

Here are the slides from this talk as handout: Gamma presentation handout.pdf (234.2 KB)

It contains some additional slides I not had the time to show.

Don’t hesitate to reach out to me or Mario Kröplin (@linkrope) in case of questions!

kuniss · October 14, 2021, 10:18pm

Thanks for the kind words.

I got your point, I guess!

In fact, the current EAG grammar description language is some noisy. E.g., to write down a rule for digits which have the same representation in the input and in the output language, you would need to specify the following hyper rule (which combines parser rule and affix forms as affix parameters as a kind of one-to-one mapping):

digit:
    <+ '0': digit> '0' | <+ '1': digit> '1' | <+ '2': digit> '2' 
  | <+ '3': digit> '3' | <+ '4': digit> '4' | <+ '5': digit> '5' 
  | <+ '6': digit> '6' | <+ '7': digit> '7' | <+ '8': digit> '8' 
  | <+ '9': digit> '9'.

And in addition, the meta rule needs to specified:

digit = '0' | '1' | '2' | '3' | '4' | '5' | '6' | '7' | '8' | '9'.

(Please note, that the ‘digit’ non-terminal exists 2 times, first as non-terminal of the hyper grammar, second as non-terminal of the meta grammar.)

This becomes especially a problem, if you want to specify Unicode identifiers…

So, we have already thought of introducing a short-hand for that kind of specifications (1:1 mappings) which implicitly expands to the original meaning.

However, this is always a balancing act, as Scala showed with their implicits… To much implicit can something made hard to understand or hard to reason about.

The beauty of the EAG formalism is that it has only a few concepts and principles. When you get them, it is easy to reason about a specification. All other is explicit.

This was also the original goal of Extended Affix Grammars and their implementation at the TU Berlin: To allow to specify compilers which behaves equally no matter who had implemented and interpreted the language specification. Back in the beginning of Computer Science (remember ALGOL-68 and van Wijngaarden’s 2-level grammars) this was a real problem which was tackled by EAGs.

But that time we did not know Unicode, or at least did not consider it…

It’s a pity that the research for EAGs had been suddenly stopped 20 years ago (the research group was resolved). But now we have hope to push it forward again. Maybe not as a research topic, but to get it in use for a broader audience and improve its usability.

jschuhmacher · October 18, 2021, 3:20pm

Sorry to have missed this presentation; very nice to see EAG’s again.

For inspiration, you might also want to check out: The Extended Affix Grammars Project (ru.nl). The ftp links are broken, but here is a direct link, should anyone want to try it out: Index of /eag (ru.nl). (It is the last version of EAG by C.H.A. Koster from ~ 2009).

This one sports a feature called semi-terminals, it allows:

digit: "0"; "1"; "2"; "3"; "4"; "5"; "6"; "7"; "8"; "9"; "0".

to be specified as:

digit: {0-9}.

but also regex-like use, i.e.:

id: {a-z}+.

The syntax follows Van Wijngaarden’s grammar syntax closely, it is quite compact.

kuniss · October 18, 2021, 6:50pm

Thanks for the links, Jelle.
Came across your first link several months ago. However, the links did not work for me. So, I did not dig further.
Good to have them now. Already downloaded.

The EAG project seems to have a scanner generator implemented or applies an external one (not sure of course afetr a first look).
We do not have a real scanner generator (in the sense of constructing DFAs). We had concentrated our effort mainly into the evaluator generators.

But short hands for identifier and literal specifications are definitely worth a closer look!
However, at the end it is not where the semantics play…

Interestingly, even if the formalism is similar the concrete specification syntax is completely different in both projects.

You may try out the links in the presentation. After registering at Gitpod you may try out a running compiler generator which generates and builds real runnable compilers.

sanyaade · October 19, 2021, 2:46pm

Yesterday I was going through the slides when I remembered I have a book on BNFC on one of my bookshelves. That made me spend some hours on combing the internet for some similar tools:

http://bnfc.digitalgrammars.com/
The BNF converter · GitHub
Welcome to BNFC’s documentation! — BNFC 2.9.4 documentation (online tutorial)

This could be used to port from D language to say Java or other platforms

God blesses!!!

Best regards,
Sanyaade

kuniss · October 25, 2021, 9:09pm

Thanks for the hint and the links, Sanyaade!

But I guess, here is a misunderstanding: We do not want to port the compiler generator from D to another language. It was already ported from Oberon-2 to D…