Hi again Luca,
I decided to write a short primer on LLVM. I hope you, and others, can use it for something If any of you have questions, feel free to ask!
Introduction
LLVM is an acronym for “Low-Level Virtual Machine”, which is a bit of a misnomer in my view as LLVM is most of all a fairly generic compiler back-end. LLVM is built around a low-level intermediate representation (IR) which is aptly named “LLVM IR”. LLVM offers tons of features such as Just-in-Time compilation (JITing) so that you can generate LLVM IR and then have LLVM convert it into executable code, which can be invoked directly from the compiler or tool.
Layers
There are two layers that you can use when you want to work with LLVM:
-
LLVM IR as a textual representation which is input to the LLVM tools. This is the method I use because I don’t want to be bound by internal API changes and don’t want to have to relink and republish my work whenever a new version of LLVM is published.
-
LLVM bitcode which is a binary representation of LLVM IR. The relationship between LLVM IR and LLVM bitcode is roughly like the relationship between assembly source code and an object file.
Tools
There are a bunch of tools in the LLVM tool-chain, but you can do most simply by using the C language frontend (clang
), by specifying one of the desired input extensions such as .bc
(bitcode), .ll
(LLVM IR), and so forth.
Tips
-
Initially, I’d suggest using LLVM IR as the output of your compiler and then invoke clang
to translate LLVM IR into LLVM bitcode. This is much easier to work with and reduces the impact of internal changes to the code base (LLVM is very actively developed). The C++ APIs used to generate LLVM bitcode with tend to change quite often and sometimes quite drastically. There is also a C API, but last time I checked it (some years ago), it offered only a fairly small subset of the C++ API.
-
Don’t bother with Static Single Assignment (SSA) form initially. Computing the proper temporaries using SSA is quite difficult and even the LLVM samples warn against doing this. Instead, generate code that uses the alloca
pseudo-instruction to allocate storage on the stack and let the mem2reg
pass figure out how to convert this into SSA. I started out without knowing this and therefore wasted some time on trying to compute the proper SSA form, only to realize that the LLVM documentation itself warns against doing so.
-
If you have trouble figuring out how to do something with LLVM, the easiest is to write a tiny C or C++ program that does what you want and then translate it to LLVM IR using this command:
clang -fno-asynchronous-unwind-tables -fno-exceptions -fno-rtti -Wall -Wextra -masm=intel -O3 -S -emit-llvm -g0 $1 > $1.ll
-
Don’t waste energy on writing a Runtime Library (RTL) in LLVM IR. LLVM IR is meant to be generated because of the SSA form. Hand-writing LLVM IR is pretty tedious and tiresome.
-
In my experience, it is much easier to develop against LLVM on Linux. The Windows support is fairly complete, if not complete, but you need to install Microsoft Visual Studio (a no-go in my world) whereas you can install LLVM and Clang on Ubuntu Linux just using sudo apt install clang-9 llvm-9
or sudo apt install clang llvm
, depending on your version of Ubuntu Linux.
-
Make sure you don’t use readnone
as an attribute on your functions, unless they don’t read memory, as the LLVM tools generate invalid code, with no warning, if your code does access memory even though readnone
is specified.
-
LLVM does not natively support Unicode so you have to output UTF-8/UTF-16/UTF-32 values as byte values. This is by far the weakest point in LLVM, IMHO.
Examples
There are very good examples on the LLVM website. I recommend studying at least the Kaleidoscope sample before you start out on LLVM.
Braceless
Braceless is my name for my hobbyist programming language, which is unlikely to ever become a usable product. But it can be used to see an example of how I have made use of LLVM v8+ by generating LLVM IR from a Python script and how I invoke the LLVM tools to translate the generated LLVM IR into an executable file. Currently, not much is going on publicly on the Braceless project, but I am working on it now and then. I am in the middle of a very large refactoring project so updates are postponed until I’m finished with that.
One way to start out with LLVM would be to clone my Braceless GitHub project and try to get it going on your system. As far as I recall, the version on GitHub does generate a valid executable (it probably core dumps), but it would give you something to start out with.
I’m willing to edit and update this reply if anyone asks questions or offers suggestions. I’m confident that the above is lacking a lot, but you have to start somewhere.
Cheers,
Mikael Egevig