Which parser generator are you using (if any)?

Ah right. Well with glr and gll algorithms the parser can accept any context-free grammar rules. So the grammar does not have to be in LR form or LL form.

This means that the BNF formalism can have modularity features, composable grammars, because you can just throw rules together and the parser will still “work”.

This also means you can write: E ::= E “+” E | E “*” E | ... ; etc; lets say for C you have 20 to 30 rules, without additional nonterminals to get the grammar into an LR or LL shape.

So grammars for glr and gll have a lot fewer nonterminals and they can be composed and reused without change.

There is a cost however for this; first the parser time is non optimal (but usually still nearly linear for programming languages). Second a grammar might be ambiguous and so you get more than one tree perhaps. The “BNF” caters for this with additional constraints such as “>“ between rules for operator precedence and whitespace constraints for defining the offside rule and such. Nevertheless ambiguity and the fact that you have to find out about it is the main cost to pay for the simplicity and modularity of general context free grammars.

Thanks for the link to the paper btw!

I’m doing the same thing within an LR grammar:

Exp : Primary
| Exp ‘+’ Exp *> add_
| Exp ‘-’ Exp > sub_
| Exp '
’ Exp *> mul_
| Exp ‘/’ Exp *> div_
;
It’s the disambiguating rules that make this work properly.
LRstar is free and comtains 9 sample grammars:
http://lrstar.tech

1 Like

Nice! That’s what we want indeed. I’ll read about it in the paper right? Composing grammars too?

I guess your transformations not only disambiguate but also make the grammar deterministic. Right?

Yes, the disambiguating rules make the grammar LALR(1) or LR(1)
and therefore deterministic. The disambiguating rules are simple:

/* Operator precedence. */

{ ‘==’ ‘!=’ } << // Lowest priority.
{ ‘+’ ‘-’ } <<
{ ‘*’ ‘/’ } << // Highest priority.

The parsing speed is very fast, reading a 227,00 line file in 0.095 seconds
and it builds a symbol table and AST. And the parsers handles the “typedef”
problem automatically. And there is more …
.

2 Likes

Nice to meet another parsing expert and enthusiast!

Thanks. I’ve been in the shadows for 30 years, an unknown, unappreciated, etc.

1 Like

It happens to more parsing people. If you are ever near Amsterdam, pls feel free to visit!

I have Dutch ancestors, but I’ve never been to the Netherlands. I’m half European,
but stuck in the US for now. The linguistic part of computer science is a very neglected
area, just like the usability and readability areas.

2 Likes

Paul,
Could you create a commercial tool able to read a grammar file and emits valid but random source code? Some sort of CSmith but not bind to C language. A generic tool.
Such a tool should be very useful for stress testing DSL languages or anything which can have a grammar (protocols, smart contracts etc)

Also I would like to point you to an interesting match between data visualization and semantics:
Large scale semantic representation with flame graphs Institute for Excellence in Higher Education, Tohoku University.
The authors of the paper are using a type of vizualisation called flame graph invented by Brendan Gregg.

2 Likes

Yes, I think the generated parser could have a built-in random symbol creator which
selects the next valid symbol from the valid symbols of the current state. Then go to
the designated state and repeat until EOF is chosen. It would need some way to make
it stop, because the number of choices could be huge.

Paul,

I would like to suggest other difficult area you can explore:
A code migration tool able to transpile from language A to language B
C -> GO
FORTRAN -> Julia
Oracle PL/SQL -> Java & SQL
Oracle SQL -> Microsoft SQL
GO -> Rust
C -> Rust ex: see C to Rust tool / C2Rust

Yes, that is one of my goals, to translate from one language to another.
I have the tool now, (LRstar, http://lrstar.tech) . It only took 30 years
to perfect it, without (much) pay.
Now, I need to see some money (venture capitalists or angel investors),
before I spend another hour on this stuff.

I have seen that there is some interest on this one. We have also made an article on that one: Convert PL/SQL code to Java

I couldn’t agree more. BNF like languages are the languages of our domain. Not only it is easier to implement and maintain parsers using these DSLs but is also easier to communicate ideas, improve consistency, capture the domain knowledge etc., usual benefits of applying DSLs. In my opinion it is well worth a time investment in learning these DSLs and related tools if you are planing to work in this field. Manual parser implementation is interesting, in my view, if you want to learn more in-depth of how parsers work.

1 Like

I remember that article, it is an interview.

1 Like

Paul,

Just some ideas in random order able to pursuit without too much cash.
o setup a “cloud” transpiler service, for low cost high volume of source files based on number of LOCs

o your tool is a perfect match for databases, parsing SQL dialects is notorious hard. There is a wave of so called GPU databases like Omnisci, Kinetica, SQream etc. Omnsici was a bison/yacc shop and switched to Apache Calcite despite the fact their kernel is written in C++; SQream is using an in house solution written in Haskell; Kinetica - i do not know but you can approach them with a business proposal. Last time when I checked they were at sql 92 something ?!?!

o HPC - Cray inc, had developed a language called Chapel dedicated for parallel programming; could be an option to translate FORTRAN libraries to Chapel under a professional service agreement.

o promote your tool(s) in a broader way

o RISC V processors; There is a new open source processor called RISC V/five and you can help their customers on testing process.

European initiatives on HPC field.
o EuroHPC is a joint collaboration between European countries and the European Union about developing and supporting exascale supercomputing by 2022/2023

o European Processor Initiative

Hope this helps.

2 Likes

I believe the reverse (Java+SQL -> stored procedures) could also be commercially interesting. Stored procedures are not a bad idea per se, but languages for storage procedures are quite bad.

2 Likes

Thanks for the suggestions.