A terminology on YAML-based languages

igor.dejanovic · September 11, 2022, 3:21pm

Hello everyone,

I’m thinking about some DSL related terminology triggered by a post on twitter
where, for some project (I can’t recall which one at the moment), the devs said that they
didn’t want to use a DSL so they choose YAML.

In my view they did create a DSL just that its syntax is constrained, i.e.
it is a subset of YAML. The language has a clear domain semantics and is thus
DSL. What they eliminated as a problem by going with YAML is implementing the
parser by giving away the freedom in specifying the syntax.

I know that some of you don’t regard what Martin Fowler calls internal DSLs to
be DSLs at all. I still find it useful to think of those languages as DSLs in
some contexts. These kinds of languages are built inside a host GPL. On the
other hand, YAML (or XML for that matter) is a meta-language which has a
predefined syntax, thus making parsing part generic.

I’m wondering could/should we call YAML-based languages internal DSL as they
piggyback on another language syntax? Or are they DSLs at all?

What is your take on this? Do you use some other name/classification when discussing these kind of languages?

mpostol · September 11, 2022, 3:58pm

HI @igor.dejanovic, I agree. YAML-based language is DSL for sure. It has the alphabet, syntax, and semantics. It is dedicated to being used for a selected application domain, hence it is domain specific. The next example that comes to my mind is XAML (Extensible Application Mark-up Language). As the name says it is language. Because the application scope is limited so it is the next DSL.

I don’t know the definition of internal DSL but my point is that there is no need to introduce a new term.

For me introducing the internal-DSL term makes sense only if we are going to say it is the DSL for internal use only. I mean no maintenance, no responsibility, no documentation, no support, i.e. for internal use only .

Hope it helps.
Mariusz

igor.dejanovic · September 11, 2022, 4:21pm

Hi @mpostol. Thanks for your comments.

For further clarification. What Fowler refers to as internal DSL is a clever usage of a host language to get a feel like you are working with a different (domain-specific) language. Lispers have been doing that for decades. Other languages have some capabilities that can be used for this purpose. For example Sinatra is a Ruby framework which looks like a new language but is a standard Ruby. Or, take for example more modern languages. Rust has a powerful procedural macro mechanism that enables building of language that may have almost arbitrary syntax while still being Rust (macro body is translated to Rust during compilation), using Rust toolchains and processes (for example rsx library).

The main advantage of these “languages” is that you can reuse all tools of the host language (compilers, editors, linters…) as what you have is basically a host language just used in a clever way to give impression of a new one.

mpostol · September 11, 2022, 7:26pm

@igor.dejanovic I also want to say thanks for the conversation. In the meantime, I learned a bit more about Fowler’s publications. Of course, it is not enough to argue against his description, but from the very beginning of his introduction to DSL “A Domain-Specific Language (DSL) is a computer language”. Following this, we may say that Polish, Italian, and German - to name only a few - are also computer languages because I know applications, which allow using these languages by computers, for example MS Word. Of course, we can say in this respect that we are using DSLs, and the domains are limited by appropriate country boundaries but what about emigrants, refugees, etc? I like definitions that provide uniformity for all definitions. For example, is the mathematics notation a computer language? The alphabet is strange, but syntax and semantics are well known and widely accepted by many. There was a seminar on converting mathematics to a selected programming language.

I like the term programming language. For me, it is a DSL that is dedicated to implementing algorithms. For example UML - I don’t know any existing compiler that can be used to execute algorithms written in this language, but I know transpilers that can be used to convert UML to C#, Java, for example, and finally the outcome could be merged with a computer program and be executed by a machine (today it is the computer - a binary machine, but tomorrow who knows?).

Again, I try to omit statements like that “to get a feel” because I cannot prove that your feel and my feel are the same or similar (how to measure similarity?). It could lead to talking about different things at the same time. It could also be called academic discussion.

I don’t agree with [quote=“igor.dejanovic, post:3, topic:1662”]
The main advantage of these “languages” is that you can reuse all tools
[/quote]

My point is that if you have different languages it is impossible to use exactly the same tools. For me, the correct is “partially you can use existing tools”. Let me give you an example. Consider xaml (XML-based DSL) self-explaining snipped:

XML .... <Grid> ..... </Grid> ...

The question is What is the <Grid/>?

For XML (language !) it is just an element.
For my students (perfectly educated on the MSDN) it is a picture rendered as an array on the screen
For me it is equivalent to new Grid() - new operator and constructor call of the class Grid.

Concluding if the question is about the syntax you may use an XML parser in this case. If the question is about semantics you have to have your own tool to process the meaning of this XML element and create an object that will be rendered on the screen as an array.

If you have two languages there must be something different between them, i.e. alphabet, syntax, or semantics. It doesn’t matter that one language is derived from another one but you are right deriving one language from another one it could make a relief - it partially allows reusability of the knowledge, experience, tools, etc.

I know that there are many programming languages and language flavors. For example, versions of programming languages. Today, working on a video course on information processing I try to define the interface as a language construct, type definition (declaration ?) of C#.

Concluding, my point is that using the internal/external prefix of DSL doesn’t make any relief but deriving one language from another one could be very useful.

Mariusz

igor.dejanovic · September 12, 2022, 5:16pm

@mpostol Thanks for your interesting insights. I appreciate it!

I agree with your correction that, for internal language, tools can only be partially used. You are right, while the syntax of the host language is reused the semantics is new and the host language tooling can’t be aware of the new semantics. Nevertheless, in many cases it is still easier then building everything from scratch and that is the reason people resort to internal DSLs to get their feet wet before embarking into making it from scratch with full control over syntax and semantics.

Where I find prefixes internal/external useful is in the communication. If I hear internal Lisp DSL and if I know Lisp I immediately have some expectations about the language. First, it is a language for technical folks (could it be non-technical?) and the syntax must follow Lisp language rules (e.g. too many parentheses ). But, on the other hand, any API/framework could be seen as internal DSL. That’s why I used “get the feel” term. I think that what distinguish API/Framework from an internal DSL is the intent of the “language” designer to trick the user into thinking (feeling) that she/he is working with a new language. Some languages are better at making these tricks and in those languages internal DSLs are common.

The same is true for other classification/prefixes. E.g. textual/graphical, technical/business etc. They provide a common vocabulary, they set the context and help in communication.

What triggered me to start this discussion is a realization that in some communities DSL seems to strictly mean “a domain-specific textual language built from scratch”. This view is too narrow for me. For me a language is a DSL if domain boundaries are clearly defined, the domain has a good coverage by the language, and if the language provides a good user experience through adequate syntaxes and editor/tools support, regardless of it’s implementation style. Thus, what Fowler calls “internal DSL” may be a good DSL for software developers that are comfortable with the underlying language syntax rules but, of course, that implementation style can hardly be used for implementing non-technical SME languages. And, as you already noted, with this style you have only “solved” the parsing/syntax part, the semantics still has to be defined.

Furthermore, I was wondering is it beneficial in any way to consider YAML/XML-based languages as “internal DSLs” in the same sense? Clearly, they are not external as they are not built from scratch and their syntax is constrained by the underlying language.

horacio.hoyos · September 16, 2022, 11:54am

If we stick to the “formal” definition, then I think yes, YAML/XML-based languages are internal DSL… as long as you don’t add new language constructs or semantics. That is, a DSL is internal if it can be executed/parsed/interpreted by the wrapping language engine.

An internal DSL in XML should not be about the tag names as TAG is a construct of XML with a “name” attribute. To make it an internal XML you would need to “limit” the constructs you use, for example forbid namespaces or disallow elements to have attributes.