How to describe DSLs?

ftomassetti · May 26, 2020, 9:04am

I am reflecting on how to present DSLs, for educational and marketing purposes.

The thing which obsesses me is that many persons have just no idea about what DSLs are, and perhaps we are doing a poor job explaining it.

I think that as a community we could benefit a lot if we learned how to communicate better what DSLs are.

So I would like to share my attempts and start a discussion over this. If I had to explain what they are about I would say that they are about:

making it possible for non-developers to do things that normally only developers can do.

Would you agree with that?

Expanding it a bit more we could say that:

The DSL allows the expert to explain their expertise via a DSL that gives feedback to them until the explanation is consistent and is executable. Then they click a button and a “program” is generated so that others in the office can use the generated “program”

I discussed that with persons in my newsletter and some very good points emerged. Here there are some:

A DSL is a notation for the language of the problem area (finance, mechanical engineering, retail inventory, whatever) and not in the language of programming computers. Therefore a person familiar with (and expert in?) the problem area can describe the program using his ordinary (though technical) language - nouns, verbs, adjectives - and expect them to mean the same things they do when he talks about the subject to his peers. And thus, the program he creates with the DSL will behave - and compute! - the way he expects it to.
Unlike, unrestricted natural language, the DSL will either reject ambiguous sentences or select a single standard default interpretation of them, ignoring any alternative meanings.
A DSL for me is: A language tailored for the user at the right level of abstraction and containing the key concepts to make it easy to express business problems and solutions in a given domain.
A DSL defines a common lingua in the same way as defined in Domain-Driven Design.
A DSL is a language designed to be used by someone who is a novice in programming but an expert in something else. A good DSL avoids details of how the computer will solve the problem. Instead, it concentrates on describing the problem using the terminology that the domain experts use when talking to each other. Designing a good DSL is quite hard because it requires a good understanding between the domain experts and the language designers. Once that is done, however, using the DSL is quite easy, at least for the domain experts.DSLs are a good idea because they bend the computer to fit the shape of the domain expert’s knowledge rather than the other way around.

What do you think?

jurgen.vinju · May 26, 2020, 10:08am

This is a very interesting discussion; I think we can all benefit a lot from making this more clear.

side note: “When and How to Develop Domain-Specific Languages” by Mernik et al. is certainly a good resource.

making it possible for non-developers to do things that normally only developers can do

I think this may be a red herring. Especially concerning the sales perspective, DSLs are not a goal but a means. It is usually not anybody’s goal to get non-programmers to do things programmers do, especially from the perspective of someone who does not understand the concept of a DSL.

To describe DSLs from the perspective of a customer, I think we should start with the problem that DSLs solve and then how they solve them, rather then desribing what they are.

Relevant problems that DSLs address:

business / software misalignment, due to communication noise and communication lag between stakeholders and designers/implementers;
high maintenance cost of legacy code due to tangled business/technical complexity and high code volume;
high design and implementation costs of new features due to “full stack” integration with existing systems

The DSL allows the expert to explain their expertise via a DSL that gives feedback to them until the explanation is consistent and is executable. Then they click a button and a “program” is generated so that others in the office can use the generated “program” others in the office.

Especially the “gives feedback” part is where the heavy lifting goes. Code generation is great for staged reuse (See the great paper “A perspective of generative reuse” Biggerstaff (1998) and the “Generative Programming” book, Czarnecki, and it gives you reproducibility and portabiloty and all that, but… it’s a highly technical perspective which does not address the main benefit of DSLs.

…the great benefit of a DSL is that the problem description is 100% without accidental technical complexity, purely focused on the domain at hand. And this can lead to:

early manual feedback cycles (designers reflect about the solution and its possible implications in an intuitive and truthful manner)
early automated feedback (type checkers, static analysis, model checking, theorem proving)
rapid prototyping (customer involvement, “agile”) (this does not have to be the code that is integrated into a customers system, just a simulation environment which helps the customer to understand what it is they are making)

So in other words: “DSLs are a better medium for communication between software stakeholders and software implementors; leading to improved communication quality, interactive and automated quality assurance, and as a bonus, generated code.”

jurgen.vinju · May 26, 2020, 10:16am

BTW, if you have feedback on our new site: http://www.swat.engineering where we put the sales perspective of DSLs online, please send an email. We’re building DSLs left and right already. We’re ready to scale up with a few more language engineers now that we have this Rascal-based DSL design and implementation train on the road, so more customers are welcome! We came out of start-up stealth mode last week for this purpose.

ftomassetti · May 26, 2020, 11:53am

Yes, I think this is a long-term goal for our community, and this could be beneficial for all of us

Noted, thank you

Right. My goal was to explain what kind of capabilities a non-developer could get, without using a definition that could result confusing for someone who is not a developer or does not know already about DSLs. The only way I found was by using an analogy, but I am not extremely happy with the result and I hope this discussion can help defining a clearer message

Agreed

This is not somethign a frequently think of. I think that DSLs may code more maintenable, but I never made reference to legacy code when describing DSLs

The first connection I see reading this is with requirements. It almost gives me the impression that DSLs are about collecting requirements. Do you think that is an analogy that is worth exploring?

ftomassetti · May 26, 2020, 12:12pm

Email just sent.

And now I am writing something more because the minimum answer should be >20 characters

jurgen.vinju · May 26, 2020, 2:38pm

Yes! I think requirements are especially where DSLs shine. A requirements and design process supported by tools, via language.

Although the DDD book by Evans only has one single paragraph (!!!) in the entire book, when you read the book from the perspective of DSLs everything rings true;
The “Rapid Prototyping” book by McConnell was also an inspiration.

jurgen.vinju · May 26, 2020, 2:53pm

BTW; maybe it’s not so bad if a DSL is still operated by a “software engineer”, while “pair programming” with the domain specialist. At least in our experience this seems to be a common use-case. It avoids the skill of interacting with a formal language and its tools, and still improves the quality of the communication.

ftomassetti · May 26, 2020, 6:19pm

This is absolutely true, and in some cases I think this is the outcome we should aim for, however in the effort of presenting a clear and simplified view I would not introduce immediately this scenario

jurgen.vinju · May 27, 2020, 9:37am

@voelter I think I’d like to pick your brain on this one. Must DSLs be pitched for end-users? Or are other modus operandi more pragmatic?

voelter · May 27, 2020, 3:26pm

Happy to have my brain picked

Pier_Mario_De_Pra · May 29, 2020, 4:09pm

What about simply showing some DSL at work?

What about a prototype (playground) an let the user play with it?

meanwhale · May 30, 2020, 5:12am

DSLs are impractical for general use, outside of their domain.

ftomassetti · June 1, 2020, 5:38am

Yes, but for that to work the DSL should be very specific for the niche of the target user. It would be impractical to create many examples, but I agree that this could be a good approach if one want to explain what a DSL is to a very specific niche

meanwhale · June 1, 2020, 2:22pm

Maybe some counterexample could be useful. For instance, you can make C++ a math language with some operator overloads and libraries. But that doesn’t make C++ a DSL. DSL’s design is more customized for its purpose.

igor.dejanovic · June 2, 2020, 12:58pm

In my experience, this is an excellent approach. The nice thing about it is that while in the beginning it helps with the communication and requirement analysis, at later stages domain expert, by looking at the engineer quickly making the specification using a language with a familiar syntax, eventually loose the fear of the new language/tool and express the will to actually try to specify solutions by her/himself.

When re-engineering legacy systems DSLs are a viable target for the new system. Analyzing the domain knowledge and business rules/processes in the legacy system code is hard and costly, and once it is recovered putting them in another general purpose language will just make another legacy and the analysis will have to repeat again after some number of years. IMHO, using the DSL as the target should prevent this from happening over and over again.

jurgen.vinju · June 3, 2020, 8:46am

Agreed; we’ve been doing some of that model extraction from C++ code; in a research project with a partner in embedded systems. I think the promise is really good but the details of extracting valuable high level information from legacy multi-threaded C++ code are hairy.

Our current tack is to approach model extraction also in a domain specific way; we inform the code analysis tools of local assumptions they can work on based on expert knowledge from the code owners. And then we try to verify these assumptions later down the line when we have the models expressed in a DSL, using regression testing and model checking. It seems to work, but there are many many ifs and buts. There are many factors in play. Very interesting stuff. Love to work on it.

solmi.riccardo · June 5, 2020, 9:40am

I would like to try to correct a bias I see too often by adding a complimentary statement on DSLs. They are about:

making it possible for a developer to do things that normally only dozens of developers can do.

We have to keep focusing on empowering ourselves because the vanilla distributions of mainstream languages and tools are definitely not enough to meet current software needs.

With respect to the goal of presenting DSLs to people without prior knowledge on the subject, I think it is useful to share our experience and understanding of DSLs to broaden our knowledge on the many facets of what DSLs are.
Therefore, when we meet a potential DSL customer, based on his actual knowledge and needs, we will be able to select the minimum definition/proposal that he can recognize as valuable for him.

In the initial phase of involvement, whenever I have actively tried to provide the customer with a complete picture, this has caused confusion in him and the feeling that a DSL was too much for him.

ftomassetti · June 5, 2020, 9:49am

I think this is an interesting point of discussion. When I started being interested in DSLs I saw them as a tool mainly intended for me and other developers like me. My first ideas for using MPS were about building Java extensions. Over the years instead I ended up focusing more and more on DSLs for non-developers.

Now, are they different things that should have different names?

Personally I think I focused more on DSLs for non-developers because I found that developers have an harder time agreeing that a tool created by another developer can help them. For example, I think about the amazing IDEs that there are out there, which costed millions to develop and still most developers expect them to be free or extremely cheap.

In general it is easier to show the value of a DSL for non-developers because it is about making possible something that was otherwise impossible, while DSLs for developers are about increasing productivity and maintenability which are things that have to be measured. I still think they are immensely valuable, I just find them harder to market. For this reason I am very interested in hearing more experience reports of successful DSLs for developers and I think your experience with this is particularly interesting.

mpostol · May 20, 2021, 9:14pm

I have carefully read this discussion because there is a similar one in the context of What is Information Model vs Semantic-data. To answer this question first we must agree on the definition of the language itself. Next, let’s add what the domain specific term means.

I propose to group languages as follows Languages grouping. Based on this discussion let me assume that we are talking about languages to define information (knowledge) in a form that can be read by a computer program. In other words, a computer is a consumer of the outcome. On the other hand, the language must be ready to be used as a design means by a human, i.e. readable and reusable by a designer.

Under this assumption let me propose the following definition:

The language is three sets:

alphabet - set of characters we can use

syntax - set of rules we are applying to check the correctness of the concatenation of the character (text - characters streams)

semantics - set of rules we are using to associate the correct text and the text meaning.

Ad 1. Computers always use the binary alphabet (only two characters are allowed). Designers (humans) prefer to use an alphabet derived from a native language. A trade-off between these two environments is encoding - a set of rules we can use to convert text into a compliant binary stream and back in a mutually unambiguous manner. In case a graphical language is considered there must be a compiler because the graphical alphabet is generally useless for the automation of data processing using computers.

Ad. 2. As a result of having encoding (mutually unambiguous relationship between binary and text representations), there is no need to redefine semantic and syntax used in the computer and human environment. My point is that it is the main reason why JSON, XML, YAML, XAML, etc. are so popular. UML is an example of graphical language but to exchange pictures and track changes a domain-specific language based on XML is in daily use. To get pictures a dedicated graphical user interface (GUI) must be supported by an application.

Ad 3.In the case of programming languages, the semantic rules are strictly observed and the association between the correct text (clauses) is usually more or less formally defined. The metalanguages (BNF, EBNF, ABNF, custom) could change but we can say that the rules are clearly stated - semantic rules are defined in the context of the syntax. For final validation, we can use a compiler. For the compiler, the source text is just input data to be processed. I like to ask when a text becomes a program? The answer is if the compiler doesn’t complain.

A real challenge with this definition we must face up when the semantics (knowledge domain) is defined far before the language to be used for its representation. Sometimes we must deal with the problem that the “knowledge domain” is inconsistent internally. Maybe in this case we should use the term notation instead of language. Not sure - waiting for your proposals.

jurgen.vinju · May 27, 2021, 6:39am

Categorizing languages absolutely. I’ve tried or and failed several times until I realized that languages are not the interesting thing to categorize; it’s the language processors.

Every language that exist has processors that define the role of the language at the moment of applying the processor. When new processors are added, new roles emerge. The new roles would change the classification without anything in the language itself having changed.

For example, a compiler makes a language a human-centered source language. The next day someone writes a parser generator that produces practically unreadable parsers on the same language. Now it’s a machine oriented Target language. Etc. Etc. So a language is not identified by it’s role. It has several roles depending on the existence of processors for these roles.

Categorizing language processors goes very naturally. The current list of processors characterises the language at a given moment in time. It makes more sense, and avoids funny categories.

The people of the SLE conference have also stopped categorizing languages. They are all simply “software languages”. The goal is to engineer the processors, so that’s what is good to focus on.

Cheers!