Beginners' Tutorial: Write a C Interpreter

I’ve come across this interesting tutorial on how to create a bootstrapping C interpreter with four functions:

The tutorial was written by Jinzhou Zhang, and it’s based on Robert Swierczek’s c4 project:

For people like me, who don’t have a formal education in software engineering, this kind of tutorials are really helpful, because they make compiler design more accessible to a wider public.

I’ve found and read a number of good books on the topic, but most of them tend to be too academic, in the sense that they inevitably cover some theoretical topics, and usually they do so in the same exact order — which I guess has to do with the fact that they are intended to mirror the classical way the subject unfolds in university courses.

Although the above linked tutorial might not be the best example of English writing (the author is not a native English speaker, and he acknowledges not mastering the language), I see it as a good example of how a tutorial targeting non-experts should be written.

And I would love to see more tutorials like this one.

The problem is not the lack of tutorials on the topic of language/compiler design (on the contrary), but rather that most of them tend to be cut-down versions of what you’ll find in most academic books on the subject. So the problem has more to do with the way the topic is approached.

In my opinion (and the author of the tutorial shares my view) is that non-trained programmers who approach language design are most likely interested in learning how to write a parser by hand, and not how to create one with a parser generator (which is what the majority of books and tutorials do).

I probably don’t even need to mention that the vast majority of the compiler design tutorials found online deal with implementing a calculator — surely, parsing these expression teaches you the basis, and operators precedence, and the shunting-yard algorithm, and blah, blah — but does anyone really believe that someone who sets out to learn how to create his/her own language is thinking to develop a calculator? The problem is that these tutorials stop there, leaving you with 1000 lines of code on how to emulate a calculator (most likely using yacc, lex, bison or some other generator tool to achieve this). That’s unlikely something that’s going to quench the thirst for knowledge and expectations of those who want to learn how to create a language of some sort.

I’m well aware that there are good reasons why the academic approach is shaped they way it is, and that it’s the soundest approach to the subject; but it’s an approach tailored to match the formal education of colleges, on how the various topics are handled in the education system.

For an “outsider” this approach doesn’t play all that well. Theory needs to be backed up by practical examples, and possibly chunked down to a manageable size. I’ve never quite got around the fact that few books on the topic of language engineering actually focus on hand writing a parser, and that when they do it’s usually an exercise on how to write a parser that can build lexer from EBNF grammars (i.e. an example on how to create a parser generator).

Today we have many developers who don’t necessarily have a software engineering training (e.g. web designers who jumped into programming), which are showing a growing interest in language and parsers design. For example, my interest was captured when I stared to create my fist syntax highlighter definition for an unsupported language, and then started to create syntaxes for editors, etc. This is what forced me to learn about BNF, parsing, etc. (something I wouldn’t have dared to approach otherwise), and then started to realize that these were the same building blocks of language engineering, and that maybe I could (after all) explore the possibility of experimenting with real languages.

I can’t avoid noticing that tutorials like the one mentioned here are often the result of great efforts by non-specialists to break through the “academic barrier,” succeeding in some measure, and then sharing their findings with others. So many similar articles complain about “academic barriers” encountered in their attempts to break through the topic, and that once these barriers were overcome they could finally see that creating a compiler is not that hard as books on the topic make it look like.

I really hope that in the future we might see more books on the topic of language and compiler design, written by engineers for an audience of untrained programmers. And by this I mean books where the author puts aside all his/her formal training, focusing on how to deliver the topic by reducing its scope and complexity, by turning it into a practical hands-on example on how to create a proof of concept (yet working) language (not a calculator). Authors who are willing to drop all the non-essential academic jargon in favor of layman terms that can bridge the gap and make the topic more accessible.

This great community of ours, Strumenta, which welcomes both the professionals and the enthusiast amateurs, could be the ideal place to try and fill the knowledge gap on this precious topic of language engineering.

After all, when we look back at when emails were born, who would have ever thought that soon everyone would be using them on a daily basis, regardless of age and education, and carrying their emails client with them wherever they went? Back then, emails were the exclusive domain of high-tech, long-bearded wizards living (Oops, working) in some basement/lab at MIT or some other rocket-scientists facility. The same goes with many other computer related technologies, which were probably not even designed (let alone envisioned) to be used by the masses. But then came the “computer revolution”, and now everyone carries in his/her pocket a smart phone device far more powerful than the Pentagon’s super computers of those days.

Computers play such a big role in our daily lives that computer science has leaked into culture, and the boundaries between the domain of expertise and users have become blurry, since they often overlap in real world usage. Language design is no exception, and it’s definitely a domain upon which even non-engineers have set their eyes on.

Surely, there’s always going to be a limit on how far an untrained, non-engineer can venture into this field. But this is not a reason to declare it a “no go” zone, on the contrary. Limits can always be surpassed, if one is willing to catch up with studies, as it’s often the case. The problem right now seems to me that we lack a well-trodden path bridging the gap between academia and amateurs when it comes to language and compiler design … which brings us back to square one, the tutorial linked at the beginning of this post.

What makes the Write a C Interpreter tutorial so special (at least to me)?

First of all, it’s worth mentioning that its author, Jinzhou Zhang, was actually a computer engineering student who dropped out of college right before the compiler design course was about to begin. He mentions this in his tutorial, which is full of personal thoughts and insights that are just as precious as the code. So, he’s neither a total amateur nor a fully trained engineer, he stands somewhere in-between, having some grounding into academia, and at the same time being left with a thirst for knowledge which he needs to quench.

The actual code was taken from a third party project, the c4 repository by Robert Swierczek, described as “an exercise in minimalism”. Jinzhou Zhang had sufficient training to work his way through the code, and a balanced view on the topic between what are the needs of the untrained and the academic ways of presenting it, which enabled him to chunk up the original code into a multi-step approach around which he could tailor a step-by-step tutorial targeting untrained programmers. And I think that he succeeded in doing so (and the 2.5k stars on his repository confirm this).

There are hundreds of compiler design tutorials in the wild, but only few of them seem to hit mark of bridging the gap that separates academia and amateurism. Bridging this gap is the role of adventurers, and not an easy task either, for it requires striking a delicate balance between the needs and limits of those who lack the means, on the one hand, and the solid foundations upon which these sciences were erected, on the other.

There probably isn’t a single correct way to go about it, but some people will simply feature better than others at the task, and others will follow their footsteps, until a well-trodden path is established. Well trodden-paths eventually evolve into main routes, which attract services for the travelers, which lead to an economy (motels, restaurants, bars), and eventually a town is build around a main road.

My guess is that, eventually, this is how the current gap will be bridged (figuratively speaking). The clear signs of demand are out there for all to see, but supply is still short and an open challenge — no one says it’s going to be easy; on the contrary, it’s quite hard to deliver in (i.e. translate to) layman terms that which was naturally learned through the language of scientific specialism, because the whole educational path is designed to be as smooth a transition as possible — whereas the gap we’re speaking about is full of bumps and wrinkles that needs ironing out.

Probably the best (if not only) way to fill this gap is by paying attention to those who succeeded (in some degree) in these early steps; and by mutual feedback between the trained and untrained, between provider and consumer (so to speak). And I can’t think of a better place than this community, Strumenta, for this task, since it includes both professionals and amateurs, and offers a trusted (and closed) space where it’s possible to discuss the topic without starting flame-wars.

I hope that the tutorial link and my shared thoughts on this topic might inspire those who wish to open the doors of language design to non-experts, in some degree or another, e.g. by writing books, creating video tutorials, etc.

4 Likes

Hi, let me do a quick comment on your post. Industry level compilers are complex objects: good tutorials can help you to jump faster in the topic, but only hard work and learning will bring you to have the job done.
Again, almost everything can be learnt by-doing, but without a strong knowledge of fundamentals you will never do the step ahead. To design an house, you have to know the science of structures, it’s a mere illusion that a good computing tool is enough.

1 Like

Thanks for your reply @ldesantis,

Industry level compilers are complex objects: good tutorials can help you to jump faster in the topic, but only hard work and learning will bring you to have the job done.

The point is that I believe that many people don’t want to learn how to create industry level compilers, but simple DSL or interpreters.

Again, almost everything can be learnt by-doing, but without a strong knowledge of fundamentals you will never do the step ahead.

The problem with knowledge of fundamentals is that one needs to know what to look for and where — which is exactly what you won’t find in an academic text, because they target an audience which it’s assumed to have gone (or being undergoing) a full training where tutors will be feeding them all the knowledge they need, in a pre-planned order which has a broad spectrum of skill as final expertise. It’s not like there’s a university degree on compilers design only.

We have to assume that intelligent people who have managed to learn any hard sciences will have the ability to learn something new, provided they know where to start from. A nuclear physicist or chemical engineer will probably have no clue how computers work, but there’s no reason to think that he/she won’t be able to learn a new subject if provided the right directions where to start looking. But we can’t expect him/her to start all the education training from scratch, from kinder garden to a doctorate in computer engineering; we should instead isolate the relevant topics in a pertinent manner.

To design an house, you have to know the science of structures, it’s a mere illusion that a good computing tool is enough.

I thought that I had made that point clear in my post, i.e. that there will always be limits to the self-taught approach. But while I agree that to build a skyscraper you’ll need very strong foundation in building structures, materials resistance, weather damages, etc., it’s also true to the average man can build a garden shed with far less knowledge.

There’s no reason why a skyscraper engineer can’t teach a layman how to build a garden shed; but it will require delivering a reduced version of his/her knowledge, and adapting it to the immediate needs of the shed-builder. Often the technical jargon is one of the major barriers, since academic text books opt to call things with their technical names (sometimes a pipe is just a pipe, so we could just call it “pipe”).

But surely, promoting the idea that it’s impossible to build even an arithmetic expressions interpreter without an engineering degree is not going to motivate people to try and approach the field. On the other hand, motivating amateurs to experiment creating a simple parser could inspire them, and who knows, if they discover they really love it they can always go back to school and get their degree (at least, here in Italy we don’t have age limits preventing people to going back to college).

It’s worth remembering that computer engineering is one of the very few professions which doesn’t have a “professional order” in most (if not all) countries in the world — unlike physicians, psychologists, and many other professions — even though it’s a profession that has a direct impact on national security issues (actually, because of that). A programmer is a programmer anywhere he/she goes in the world, without necessarily needing a professional order to recognize and certify his/her talent, whereas even the best heart surgeon is not allowed to give medical advise (let alone practice) in a foreign country (it would be a crime to do so). And the industry and government agencies are willing to hire anyone who has proved his/her worth in the practical field, regardless of credentials, including self-made hackers who managed to breach high profile systems. Here is where the similitude with the house builder ends, because if a wanna-be compiler designer fails no one is going to be hurt or die, even if he/she fails very badly.

3 Likes

It’s probably a matter of personal thinking patterns because for me personally starting from a grammar and generating a parser is far easier than coding a parser by hand. I’ve done both. Some people prefer going top-down, other people prefer going bottom-up. There’s space for all kinds of different approaches.

2 Likes

@alessio.stalla:

It’s probably a matter of personal thinking patterns because for me personally starting from a grammar and generating a parser is far easier than coding a parser by hand. I’ve done both. Some people prefer going top-down, other people prefer going bottom-up. There’s space for all kinds of different approaches.

This is definitely true and something important to consider, because different thinking ways tend to appreciate different exposition methods too. American psychologist Robert Sternberg researched this topic with his colleagues for a long time (600+ publications) and formulated the Triarchic Theory of Intelligence, which aims at helping teachers and writers better understand the learning patters of their students/audience:

He also wrote a number of non academic books on the topic, targeting teachers, speakers and writers, in which he explains how to create a lesson (or write a book) in a manner take takes into account the three major learning patterns, by presenting each concept in a way that will be appreciate by all learner types.

I’ve found his work very interesting because it really broadens your views on communication in the context of teaching anything. He also carried out extensive research in children schools around the world to sustain his theory by demonstrating that very often poor schools grades are not the fault of the students but the fact that teachers only deliver knowledge in a single thinking pattern, making it hard for differently thinking kids to absorb the lessons.

Computer related concepts are already quite complex in themselves; writing good teaching books requires not only a sound knowledge in the topic at hand, but also good writing skills and, possibly, a basic understanding of the different learning models. Unfortunately, too often books on computer related subjects are packed with sound knowledge but written in an awful way.

For some authors simply have the gift of writing. For example, I consider Randall Hyde (author of the well known Write Great Code book series) one of the best computer writers. His books are simply wonderful, he manages to explain complex concepts in a very neat manner, making them easy to understand. Somehow he’s able to lead the reader’s attention right where he wants it go, and the whole learning experience is seamless. His books are highly successful and have undergone many reprints, and that’s not because of a lack of material on the topic, but because he’s great at writing.

I wish every computer book I bought was as good a read, but the truth is that unreadable books are the norm rather than the exception. After all, programming tends to be a solitary experience, and if one is not coding at the keyboard he’s probably thinking about a problem and how to solve it; so it’s not exactly a profession that promotes communication skills (as opposed to jobs like public relations, lawyers, etc.). And probably that’s why academic teachers in our field tend to write better books than inventors and discoverers — or, e.g., a system admin who works and lives 24/7 in an underground super high-security data-server bunker, carved in a mountain or dug in Alaska during the cold war, or in some abandoned mine converted into a data center (not exactly a job that stimulates social and communication skills).

2 Likes

For people interested in his books you can buy them al low cost. They are in offer in Humble https://www.humblebundle.com/books/learn-you-more-code-no-starch-press-books for the next 18 days (and 12 more books about Rust, Java, Shell, Powershell, C C++, Python, … are included in the bundle for less than 15 euros).

1 Like

Thanks for the tip! I’ve purchased the full bundle; it’s worth it just for the eBooks I was missing, and because some of them were updated editions.

@tajmone, there is a new Humble Bundle offer of programming books from O’Reilly’s Head First series. One of them is “Head First Kotlin”, a language that comes up a lot in our meetups…

1 Like