Documenting your languages

A recurring theme we see is discussing how to document the languages we work with.

This is both true for languages for which we build parsers and for DSLs with projectional editors.

Right now we are focusing on textual languages. We would like to build a tool to generate documentation (as HTML or PDF) starting from the grammar. @alessio.stalla is currently working on this

We could:

  • Describe the grammar of rules using railroad diagrams
  • Show examples for each rule, taking them automatically from a set of example files
  • Extract comments from the grammar
  • Show how the information present in examples would be put in a parse tree

For DSLs built in MPS in the past we had given the possibility to add comments and then extract them, together with diagrams for seeing the related concepts (concepts extended or referred) and tables to list the properties.

I was wondering if you had any specific approach to document languages or any advice on what worked for you

1 Like

I think you are asking the very generic question of “how to document software”. In other words, I think the challenges of documenting DSLs are the same as documenting other software (or, even more general, documenting products).

To answer whether or not a way of documentation is any good, I think one has to be more specific what the goal of the documentation is.
First, is it meant to have value for language engineers or language users? Second, is it meant as a tool to onboarding new language enginners / language users or is it meant to be used constantly by all engineers / users during the lifetime of the DSL?

I think the effort of extensively documenting a language for engineers only makes sense when the language has reached a certain level of maturity. If the language design is still very much in flux, it is too expensive to keep a manually created documentation up-to-date.
In my experience, language engineering teams are usually rather small, and onboarding new members usually works best by means of looking at the code together and by giving them small, self contained tasks to gradually explore a language/family of languages.
Having a 10k view on your language that conveys the general structure and intent behind some design decisions might help, but I think only if we are talking about a very complex language or a family of languages.
Again, the semantics of a language as well as technical intrinsics (e.g. if the language compiler features specific optimizations) can vary vastly and therefore I don’t think there is a silver bullet on how to document these.

If we are talking about documentation for language users, I’m a strong believer that languages with semantically rich, meaningful error messages help users best to learn a language. One feature of DSLs in particular is that they should have great affordance, so I think the need for extensive, external documentation is already a smell of bad DSL design.

The only thing we ended up handing out for (projectional DSLs) I was involved with were compact cheat sheets for how to use the editor (e.g. a cheat sheet for most important keyboard shortcuts) and the extended IDE tooling that we built for the language.

Being the Elm fanboy that I am, I have to mention that the format of how Elm packages have to be documented is something I really like. The documentation happens inside the package source code via markdown: https://package.elm-lang.org/help/documentation-format
I looked at dozens of elm packages so far and I think the unified format of how they are documented helped me a lot assessing and learning them.

Yes, for many aspects it is true, however there are maybe some specifities to document languages. For example, when documenting textual languages one could use railroad diagrams, so experience related to documenting languages is more relevant here than experience to documenting software in general

Very good point. Personally I am more interested in documentation for language engineers, as documentation for users is more “generic”, I think it varies a bit more between projects

True, I am thinking to a case in which the language starts to be stable and I would be very interested in whatever we can generate automatically

A first use-case I am thinking about very concretely is the documentation of ANTLR grammars. Someone build them or refactor them or migrate them and then they are maintained over 5, 10, 20 years by a team of developers who are not necessarily experts in parsing or language engineering. What kind of documentation can we give them to make their life easier?

The second use-case is, we build a DSL using MPS and then we hand over the maintenance to a group of developers working at the client. How can we make their life easier?

The third use-case is, we build DSL using MPS and we want to provide documentation for the users of the language. How can we build it and maintain as the language slighlty evolve over time?

Generating documentation automatically makes sense – and even more if language is volatile. As a direct example (albeit not purely for textual language) when targeting language users, MetaEdit+ (a tool I use) creates a language help automatically based on the language definition and data entered by language engineers. Language users can then choose Help from menu to see details of the language concepts, their properties, naming rules, notational symbols, constraints, and available generators. Portion of the help and documentation is also contextual on language use scenario. A big plus of this is that also in volatility cases or when at early stages in the language definition, automated Help system provides correct information (plus no need to document things afterwards :wink:

In addition to this basic language help, the usefulness of documentation/help is directly depending on what kind of additional information is given on each language construct. In addition to definition or description of the concept, language engineers may also enter details on how to identify elements, suggest a naming policy (if not explicitly specified already in the metamodel), provide guidance, patterns or propose modeling practices etc. to be followed. These are then provided too in integrated help/documentation.

As sometimes people also prefer having the language described in additional documents, slides etc. we then copy the automatically produced help. While we may also consider giving the metamodel itself as a document, it does not normally serve language users. Instead, they want to see concrete examples. Examples are particularly good since they have already some context and can be used to learn the language in an interactive manner - being part of the tool. Ready example models also allow running generators to produce code, documents and other outputs.

Unlike often thought, creation of the examples does not require much effort since those are typically already available; They were created to test the language and pilot it in a small project. In fact, often the models created in the pilot are used as examples for the language users. These examples can be provided separately or in a tutorial sections so language users can see them at the same time while working on their actual projects.

1 Like