Metadata-driven generic compilers?

rafael · May 3, 2020, 8:24pm

Folks, I am looking for a generic compiler based on a somewhat uniform host syntax that accepts the abstract syntax as a pluggable (in-memory) object model description at runtime (a grammar of sorts, but as native language obejcts).

Anyone familiar with tools like that, ideally for Java? The closest I have seen is ESON, but it is dead now.

Edit: I guess I am looking for parsing tools based on parser combinators, and many options exist for Java. That covers the dynamic part. Now I only need a solution that takes care of the generic elements of the concrete syntax (like ESON did). Any suggestions? (If I have to take care of that part, it should not be too bad, but ideally that should come in the box).

Cheers,

Rafael

solmi.riccardo · May 4, 2020, 9:41pm

I looked at ESON to try to understand your question and it seems to me that you are asking for a dynamic modeling tool with a textual generic persistence.

I only know the language workbenches, they are a superset of such a tool, but you can simply use them for the three services you need:

dynamic deploying of metamodels
load/save of model instances conforming to a JSON like format derived from the metamodels
a Java modeling API to manipulate the models

For example the Whole Platform (my open source tool) is also distributed as a small Maven artifact (without Eclipse and the user interface).
It includes 4 generic persistences that meet your needs: one based on JSON-LD, another on XSI and two executable ones based on Java and Swift.

rafael · May 4, 2020, 10:14pm

Hi Ricardo, I realize I was not clear on my goals: I want a cheap way to build textual DSLs for any object model. Just as an example, let’s say you have a class model such as this:

http://abstratt.github.io/kirra/com/abstratt/kirra/Namespace.html

Then I would like to be able to parse a Namespace graph from user-provided input such as this:

namespace ... {

    service EmailService {
        operation sendEmail(toSend : Email)
    }
    
    user entity User {
    }

    entity Expense {

        operations: 
            // if no named children are required, no need for grouping braces
            finder open()

            finder over(/* attributes and default children */ limit : Decimal) {
                /* named children */
                permissions:
                    //
            }

            finder from(employee : Employee)
            ...
            action submit()

            ...

        properties:
            ...

        relationships:
            ...
    }  

    tuple_type Email {
        // properties:   <- not needed, as a TupleType may only contain properties
        subject : String
        body : Memo
        from : String
        to : String
    } 
}

The subsidies for building this parser are all more or less in the class tree (I may need to provide some additional information stating which objects are just references vs. child objects). The base host syntax (use of curly braces, parenthesis, colon etc, how modifiers and metaclasses are identified) could be imposed and fixed by the generic parser, as long as the resulting notation has good usability. The rest could be derived via reflection from the class graph.

Is that clearer now?

solmi.riccardo · May 5, 2020, 9:34am

Now it is clearer even if I still have some doubts.

You are asking for something that typically in a LWB is decoupled in two facilities, let me call them:

APIs to models mapper
grammars to models mapper

With respect to a direct Metamodel definition, they are two alternative ways to define a DSL starting from an existing API or an existing Grammar.

Unfortunately they are designed in a way that make them not composable for your goal:

You (dynamically) give (to my Pojo DSL) a Java API and you get a derived DSL with two way mapping to the API.
You (dynamically) give (to my Grammar DSL) a Grammar and you get a derived DSL with two ways mapping to text.

As long as I know this approach is followed also by other LWBs having these facilities.

So, both DSLs assume you are committed on the API/grammar side and almost unconstrained on the derived DSL side. But then you should dynamically derive an additional two way mapper for possibly unrelated DSLs (good luck ).
As a result, the minimum effort could be to use only the dynamic grammar facility and generate the “grammar derived DSL” /API mapper.

And now my two doubts:

ESON is not a grammar based facility, it is just a generic textual persistence for EMF models. I don’t know if you can customize something in the grammar side. Is it really not enough for you to get something human readable/writable?
Is the object model API really added dynamically to the system or was it just derived at generation time? In the latter case, you would be in a much better position and perhaps you could also use a static Parser generator like Antlr.

alessio.stalla · May 5, 2020, 12:22pm

By interpreting your initial question literally, I was thinking “that’s Lisp!”
However, looking at your example, it seems to me that you want marshalling and unmarshalling of Java objects to a well-known format. There are solutions for that already: JAXB for XML, Jackson for JSON, SnakeYAML for YAML and so on.

rafael · May 5, 2020, 1:19pm

Hi @alessio.stalla! Yes, it does look a lot like a LISP. These things detract from that though:

I need more syntax sugar than LISPs provide (ESON is a good example)
Not only sugar, I want to impose some rails on the kinds of constructs available (basically, data object declarations, their attributes, children and references)
It is meant for “external DSLs”, and as such only the DSL semantics should be in place

Finally, though serializing an object to that syntax would be possible, that is not a goal or a driving use case - authoring is the primary use case. The goal is to allow any object-oriented developer to create “external DSLs” for users in their applications with little to no effort. That is why XML, JSON, YAML would not fit the bill, as they are too basic and too machine-oriented, so insufficiently human-friendly.

So, summarizing:

generic/fixed basic concrete syntax
meant for authoring, not persistence
pluggable metamodels (in the form of class models in the application’s own language, instead of proper grammars)
focus on modeling data, no need for behavior, expressions etc
works at runtime (no code generation)
readily accessible to any developer (no previous knowledge of parsing needed, so no “compiler compilers”)

rafael · May 5, 2020, 1:40pm

@solmi.riccardo The API-to-DSL is really the use case here - users do not need to know really anything about grammars, languages etc. Being unconstrained on the DSL is fine. I didn’t get the bit about “an additional two way mapper for possibly unrelated DSLs”, what would be the goal there?

ESON is not a grammar based facility

Not being grammar-based, but metamodel-based is exactly my goal.

it is just a generic textual persistence for EMF models

It is that too, but also meant for providing human-friendly textual languages for creating EMF models.

I don’t know if you can customize something in the grammar side

You can’t customize the base concrete syntax, but the abstract syntax is pluggable. So, effectively, the final concrete syntax is derived from the metamodel (even if constrained by the choices made by the base concrete syntax).

Is the object model API really added dynamically to the system or was it just derived at generation time? In the latter case, you would be in a much better position and perhaps you could also use a static Parser generator like Antlr.

The dynamic part was to emphasize that I am looking for solutions that work exclusively at runtime, so no transformations, code generation etc. Also, no explicit grammar authoring etc as the target user is an OO programmer, with no knowledge of how compilers are built.

alessio.stalla · May 5, 2020, 1:46pm

Hmm, @ftomassetti will know better, but perhaps ANTLR + Kolasu (Kotlin) could work for you. You write a fairly generic concrete syntax, and then let developers choose how to map it to objects in the target library (and which forms to reject). It really depends on how much control (and expertise) you want/expect developers to have, vs how much you want the grammar to be driven by the object model.

In general, though, a sufficiently rich object model also has some kind of protocol that you have to respect in order to use it properly: at a very basic level, “this is mandatory” vs “this is optional”, but also “this should be > 42” or “if you set this, you must (also/not) set that”, “you should set this before that” or “if you add this to such and such collection, then you must set that as the parent” and so on. If you can ensure that you only deal with data objects, that’s fine, otherwise, you’ll need more and more additional metadata depending on how complex you want it to be.

rafael · May 5, 2020, 2:03pm

@alessio.stalla I was looking for something that already existed, so I would not have to reinvent the wheel, or much harder, convince people to use it…

In general, though, a sufficiently rich object model also has some kind of protocol that you have to respect in order to use it properly: at a very basic level, “this is mandatory” vs “this is optional”, but also “this should be > 42” or “if you set this, you must (also/not) set that”, “you should set this before that” or “if you add this to such and such collection, then you must set that as the parent” and so on. If you can ensure that you only deal with data objects, that’s fine, otherwise, you’ll need more and more additional metadata depending on how complex you want it to be.

That is a good point, and one that I assume could be addressed by combining:

additional metadata obtained from language types (collections are multi-valued, optional references are nullable), custom annotations and third-party annotations (ORM frameworks have similar needs for relating two ends of the same association, marking elements as optional/required) - again, within the reach of any programmer.
for additional custom validation, simple visitor-based validators, for instance, like Xtext provides. That should be familiar to any OO programmer.

Thanks for the feedback so far, Alessio. I am starting to believe that metamodel/class model-based generic parsers are really not a common thing.

alessio.stalla · May 5, 2020, 4:46pm

If you want something that exists, there’s plenty of existing formats for describing objects including those that we’ve cited already – XML, JSON, YAML, S-Expressions, you name it. Maybe you just have to find the right one for your use case.

rafael · May 5, 2020, 5:16pm

Thanks, Alessio, but those alternatives have severe drawbacks given the goals I mentioned before (though they definitely pass the “generic syntax” test). Something like ESON seems significantly more usable as a notation meant for authoring than XML, JSON, YAML, S-Expressions etc (some ESON model examples here).

Just as an example, I believe an approach with a generic parser can attain something as nicely readable as QML, for instance (see example).

alessio.stalla · May 5, 2020, 5:23pm

Then maybe just write a parser for ESON and the code to map that to plain Java objects (or whatever)?

rafael · May 5, 2020, 5:35pm

Yeah, it looks like that, so far…

meinte.boersma · May 5, 2020, 5:57pm

Some time ago I started, and of course never finished, this: evan-lang If I understand @rafael that might also not be entirely different from what he’s looking for.

The basic idea was using JSON as a programming language in very roughly the same way that LISP is “just” interpretable data. In fact, I found very similar things in the Turing Award lecture by John Backus.

I got nowhere with it because I had no time for it, and because I was a bit scared off by my sole contributor having a field day with the code base, though.

rafael · May 5, 2020, 6:43pm

Thanks for the comment, Meinte!

From your description above and some of the examples here it seems we had different goals: you wanted to build an actual programming language, as I want something that makes it easy to create object models using textual domain-specific languages.

Given that nothing like what I am looking for seems to exist (please, someone show me wrong and stop me, quick!), this is what I am considering building:

a set of Java annotations mimic’ing Ecore metamodel (@Reference, @Attribute, @Classifier etc)
a Java-to-Ecore converter that builds an Ecore model from Java code annotated as above (Ecore is an implementation detail)
an Ecore-to-parser converter that given an Ecore model produces a parser for a textual DSL on-the-fly using a parser combinator framework like funcj.parser.

The intended end result is the ability to obtain usable and reasonably good looking textual DSLs from regular Java code (no need to spend time learning and fiddling with lexers, parsers, ASTs, etc).

rafael · May 6, 2020, 1:36am

So, I went ahead and started implementing some of that…

For an example of #1 above, applied to a sample metamodel for UI modeling, see: https://github.com/abstratt/simon/blob/wip/src/test/java/com/abstratt/simon/UI.java

For a test illustrating #2: https://github.com/abstratt/simon/blob/wip/src/test/java/com/abstratt/simon/Java2EcoreTest.java

Hopefully that helps illustrate some of what I have been trying to convey.

No progress on the parser per se yet.