I’ve come across this interesting tutorial on how to create a bootstrapping C interpreter with four functions:
The tutorial was written by Jinzhou Zhang, and it’s based on Robert Swierczek’s c4 project:
For people like me, who don’t have a formal education in software engineering, this kind of tutorials are really helpful, because they make compiler design more accessible to a wider public.
I’ve found and read a number of good books on the topic, but most of them tend to be too academic, in the sense that they inevitably cover some theoretical topics, and usually they do so in the same exact order — which I guess has to do with the fact that they are intended to mirror the classical way the subject unfolds in university courses.
Although the above linked tutorial might not be the best example of English writing (the author is not a native English speaker, and he acknowledges not mastering the language), I see it as a good example of how a tutorial targeting non-experts should be written.
And I would love to see more tutorials like this one.
The problem is not the lack of tutorials on the topic of language/compiler design (on the contrary), but rather that most of them tend to be cut-down versions of what you’ll find in most academic books on the subject. So the problem has more to do with the way the topic is approached.
In my opinion (and the author of the tutorial shares my view) is that non-trained programmers who approach language design are most likely interested in learning how to write a parser by hand, and not how to create one with a parser generator (which is what the majority of books and tutorials do).
I probably don’t even need to mention that the vast majority of the compiler design tutorials found online deal with implementing a calculator — surely, parsing these expression teaches you the basis, and operators precedence, and the shunting-yard algorithm, and blah, blah — but does anyone really believe that someone who sets out to learn how to create his/her own language is thinking to develop a calculator? The problem is that these tutorials stop there, leaving you with 1000 lines of code on how to emulate a calculator (most likely using yacc, lex, bison or some other generator tool to achieve this). That’s unlikely something that’s going to quench the thirst for knowledge and expectations of those who want to learn how to create a language of some sort.
I’m well aware that there are good reasons why the academic approach is shaped they way it is, and that it’s the soundest approach to the subject; but it’s an approach tailored to match the formal education of colleges, on how the various topics are handled in the education system.
For an “outsider” this approach doesn’t play all that well. Theory needs to be backed up by practical examples, and possibly chunked down to a manageable size. I’ve never quite got around the fact that few books on the topic of language engineering actually focus on hand writing a parser, and that when they do it’s usually an exercise on how to write a parser that can build lexer from EBNF grammars (i.e. an example on how to create a parser generator).
Today we have many developers who don’t necessarily have a software engineering training (e.g. web designers who jumped into programming), which are showing a growing interest in language and parsers design. For example, my interest was captured when I stared to create my fist syntax highlighter definition for an unsupported language, and then started to create syntaxes for editors, etc. This is what forced me to learn about BNF, parsing, etc. (something I wouldn’t have dared to approach otherwise), and then started to realize that these were the same building blocks of language engineering, and that maybe I could (after all) explore the possibility of experimenting with real languages.
I can’t avoid noticing that tutorials like the one mentioned here are often the result of great efforts by non-specialists to break through the “academic barrier,” succeeding in some measure, and then sharing their findings with others. So many similar articles complain about “academic barriers” encountered in their attempts to break through the topic, and that once these barriers were overcome they could finally see that creating a compiler is not that hard as books on the topic make it look like.
I really hope that in the future we might see more books on the topic of language and compiler design, written by engineers for an audience of untrained programmers. And by this I mean books where the author puts aside all his/her formal training, focusing on how to deliver the topic by reducing its scope and complexity, by turning it into a practical hands-on example on how to create a proof of concept (yet working) language (not a calculator). Authors who are willing to drop all the non-essential academic jargon in favor of layman terms that can bridge the gap and make the topic more accessible.
This great community of ours, Strumenta, which welcomes both the professionals and the enthusiast amateurs, could be the ideal place to try and fill the knowledge gap on this precious topic of language engineering.
After all, when we look back at when emails were born, who would have ever thought that soon everyone would be using them on a daily basis, regardless of age and education, and carrying their emails client with them wherever they went? Back then, emails were the exclusive domain of high-tech, long-bearded wizards living (Oops, working) in some basement/lab at MIT or some other rocket-scientists facility. The same goes with many other computer related technologies, which were probably not even designed (let alone envisioned) to be used by the masses. But then came the “computer revolution”, and now everyone carries in his/her pocket a smart phone device far more powerful than the Pentagon’s super computers of those days.
Computers play such a big role in our daily lives that computer science has leaked into culture, and the boundaries between the domain of expertise and users have become blurry, since they often overlap in real world usage. Language design is no exception, and it’s definitely a domain upon which even non-engineers have set their eyes on.
Surely, there’s always going to be a limit on how far an untrained, non-engineer can venture into this field. But this is not a reason to declare it a “no go” zone, on the contrary. Limits can always be surpassed, if one is willing to catch up with studies, as it’s often the case. The problem right now seems to me that we lack a well-trodden path bridging the gap between academia and amateurs when it comes to language and compiler design … which brings us back to square one, the tutorial linked at the beginning of this post.
What makes the Write a C Interpreter tutorial so special (at least to me)?
First of all, it’s worth mentioning that its author, Jinzhou Zhang, was actually a computer engineering student who dropped out of college right before the compiler design course was about to begin. He mentions this in his tutorial, which is full of personal thoughts and insights that are just as precious as the code. So, he’s neither a total amateur nor a fully trained engineer, he stands somewhere in-between, having some grounding into academia, and at the same time being left with a thirst for knowledge which he needs to quench.
The actual code was taken from a third party project, the c4 repository by Robert Swierczek, described as “an exercise in minimalism”. Jinzhou Zhang had sufficient training to work his way through the code, and a balanced view on the topic between what are the needs of the untrained and the academic ways of presenting it, which enabled him to chunk up the original code into a multi-step approach around which he could tailor a step-by-step tutorial targeting untrained programmers. And I think that he succeeded in doing so (and the 2.5k stars on his repository confirm this).
There are hundreds of compiler design tutorials in the wild, but only few of them seem to hit mark of bridging the gap that separates academia and amateurism. Bridging this gap is the role of adventurers, and not an easy task either, for it requires striking a delicate balance between the needs and limits of those who lack the means, on the one hand, and the solid foundations upon which these sciences were erected, on the other.
There probably isn’t a single correct way to go about it, but some people will simply feature better than others at the task, and others will follow their footsteps, until a well-trodden path is established. Well trodden-paths eventually evolve into main routes, which attract services for the travelers, which lead to an economy (motels, restaurants, bars), and eventually a town is build around a main road.
My guess is that, eventually, this is how the current gap will be bridged (figuratively speaking). The clear signs of demand are out there for all to see, but supply is still short and an open challenge — no one says it’s going to be easy; on the contrary, it’s quite hard to deliver in (i.e. translate to) layman terms that which was naturally learned through the language of scientific specialism, because the whole educational path is designed to be as smooth a transition as possible — whereas the gap we’re speaking about is full of bumps and wrinkles that needs ironing out.
Probably the best (if not only) way to fill this gap is by paying attention to those who succeeded (in some degree) in these early steps; and by mutual feedback between the trained and untrained, between provider and consumer (so to speak). And I can’t think of a better place than this community, Strumenta, for this task, since it includes both professionals and amateurs, and offers a trusted (and closed) space where it’s possible to discuss the topic without starting flame-wars.
I hope that the tutorial link and my shared thoughts on this topic might inspire those who wish to open the doors of language design to non-experts, in some degree or another, e.g. by writing books, creating video tutorials, etc.