Which parser generator are you using (if any)?

jurgen.vinju · February 15, 2020, 10:27am

Ah right. Well with glr and gll algorithms the parser can accept any context-free grammar rules. So the grammar does not have to be in LR form or LL form.

This means that the BNF formalism can have modularity features, composable grammars, because you can just throw rules together and the parser will still “work”.

This also means you can write: E ::= E “+” E | E “*” E | ... ; etc; lets say for C you have 20 to 30 rules, without additional nonterminals to get the grammar into an LR or LL shape.

So grammars for glr and gll have a lot fewer nonterminals and they can be composed and reused without change.

There is a cost however for this; first the parser time is non optimal (but usually still nearly linear for programming languages). Second a grammar might be ambiguous and so you get more than one tree perhaps. The “BNF” caters for this with additional constraints such as “>“ between rules for operator precedence and whitespace constraints for defining the offside rule and such. Nevertheless ambiguity and the fact that you have to find out about it is the main cost to pay for the simplicity and modularity of general context free grammars.

jurgen.vinju · February 15, 2020, 10:29am

Thanks for the link to the paper btw!

anon67755252 · February 15, 2020, 11:15am

I’m doing the same thing within an LR grammar:

Exp : Primary
| Exp ‘+’ Exp *> add_
| Exp ‘-’ Exp > sub_
| Exp '’ Exp *> mul_
| Exp ‘/’ Exp *> div_
;
It’s the disambiguating rules that make this work properly.
LRstar is free and comtains 9 sample grammars:
http://lrstar.tech

jurgen.vinju · February 15, 2020, 11:38am

Nice! That’s what we want indeed. I’ll read about it in the paper right? Composing grammars too?

jurgen.vinju · February 15, 2020, 11:40am

I guess your transformations not only disambiguate but also make the grammar deterministic. Right?

anon67755252 · February 15, 2020, 7:34pm

Yes, the disambiguating rules make the grammar LALR(1) or LR(1)
and therefore deterministic. The disambiguating rules are simple:

/* Operator precedence. */

{ ‘==’ ‘!=’ } << // Lowest priority.
{ ‘+’ ‘-’ } <<
{ ‘*’ ‘/’ } << // Highest priority.

The parsing speed is very fast, reading a 227,00 line file in 0.095 seconds
and it builds a symbol table and AST. And the parsers handles the “typedef”
problem automatically. And there is more …
.

jurgen.vinju · February 15, 2020, 7:57pm

Nice to meet another parsing expert and enthusiast!

anon67755252 · February 15, 2020, 8:32pm

Thanks. I’ve been in the shadows for 30 years, an unknown, unappreciated, etc.

jurgen.vinju · February 15, 2020, 8:46pm

It happens to more parsing people. If you are ever near Amsterdam, pls feel free to visit!

anon67755252 · February 15, 2020, 9:06pm

I have Dutch ancestors, but I’ve never been to the Netherlands. I’m half European,
but stuck in the US for now. The linguistic part of computer science is a very neglected
area, just like the usability and readability areas.

cristian.vasile · February 16, 2020, 7:16am

Paul,
Could you create a commercial tool able to read a grammar file and emits valid but random source code? Some sort of CSmith but not bind to C language. A generic tool.
Such a tool should be very useful for stress testing DSL languages or anything which can have a grammar (protocols, smart contracts etc)

Also I would like to point you to an interesting match between data visualization and semantics:
Large scale semantic representation with flame graphs Institute for Excellence in Higher Education, Tohoku University.
The authors of the paper are using a type of vizualisation called flame graph invented by Brendan Gregg.

anon67755252 · February 16, 2020, 7:56am

Yes, I think the generated parser could have a built-in random symbol creator which
selects the next valid symbol from the valid symbols of the current state. Then go to
the designated state and repeat until EOF is chosen. It would need some way to make
it stop, because the number of choices could be huge.

cristian.vasile · February 16, 2020, 8:42pm

Paul,

I would like to suggest other difficult area you can explore:
A code migration tool able to transpile from language A to language B
C -> GO
FORTRAN -> Julia
Oracle PL/SQL -> Java & SQL
Oracle SQL -> Microsoft SQL
GO -> Rust
C -> Rust ex: see C to Rust tool / C2Rust

anon67755252 · February 16, 2020, 9:31pm

Yes, that is one of my goals, to translate from one language to another.
I have the tool now, (LRstar, http://lrstar.tech) . It only took 30 years
to perfect it, without (much) pay.
Now, I need to see some money (venture capitalists or angel investors),
before I spend another hour on this stuff.

ftomassetti · February 17, 2020, 6:58am

I have seen that there is some interest on this one. We have also made an article on that one: Convert PL/SQL code to Java

igor.dejanovic · February 17, 2020, 10:40am

I couldn’t agree more. BNF like languages are the languages of our domain. Not only it is easier to implement and maintain parsers using these DSLs but is also easier to communicate ideas, improve consistency, capture the domain knowledge etc., usual benefits of applying DSLs. In my opinion it is well worth a time investment in learning these DSLs and related tools if you are planing to work in this field. Manual parser implementation is interesting, in my view, if you want to learn more in-depth of how parsers work.

cristian.vasile · February 17, 2020, 4:44pm

I remember that article, it is an interview.

cristian.vasile · February 17, 2020, 5:15pm

Paul,

Just some ideas in random order able to pursuit without too much cash.
o setup a “cloud” transpiler service, for low cost high volume of source files based on number of LOCs

o your tool is a perfect match for databases, parsing SQL dialects is notorious hard. There is a wave of so called GPU databases like Omnisci, Kinetica, SQream etc. Omnsici was a bison/yacc shop and switched to Apache Calcite despite the fact their kernel is written in C++; SQream is using an in house solution written in Haskell; Kinetica - i do not know but you can approach them with a business proposal. Last time when I checked they were at sql 92 something ?!?!

o HPC - Cray inc, had developed a language called Chapel dedicated for parallel programming; could be an option to translate FORTRAN libraries to Chapel under a professional service agreement.

o promote your tool(s) in a broader way

o RISC V processors; There is a new open source processor called RISC V/five and you can help their customers on testing process.

European initiatives on HPC field.
o EuroHPC is a joint collaboration between European countries and the European Union about developing and supporting exascale supercomputing by 2022/2023

o European Processor Initiative

Hope this helps.

alessio.stalla · February 17, 2020, 5:47pm

I believe the reverse (Java+SQL -> stored procedures) could also be commercially interesting. Stored procedures are not a bad idea per se, but languages for storage procedures are quite bad.

anon67755252 · February 17, 2020, 8:45pm

Thanks for the suggestions.