This post is a compilation of two of my prior posts in the Introduction category.
I plan to revise this post as questions and comments, if any, pop up.
I use LLVM myself for a hobbyist compiler. I’m no expert but you are always welcome to ask me if there’s something you are unsure about
Introduction
LLVM is an acronym for “Low-Level Virtual Machine”, which is a bit of a misnomer in my view as LLVM is most of all a fairly generic compiler back-end. LLVM is built around a low-level intermediate representation (IR) which is aptly named “LLVM IR”. LLVM offers tons of features such as Just-in-Time compilation (JITing) so that you can generate LLVM IR and then have LLVM convert it into executable code, which can be invoked directly from the compiler or tool. LLVM is implemented in C++, but has both C and Python bindings. However, it is my belief that you pretty much need to use the C++ interface if you want to link in LLVM as only the C++ interface seems to be actively maintained and kept up to date as it is the native interface.
Primers
You may want to check out this online book: Mapping High Level Constructs to LLVM IR.
I wrote it some years ago but now it has a new maintainer, Mike Rodler, who has made a great job of converting my old document into an online book. I know that there are some issues with the examples (check out the list of issues on GitHub), but I think they still serve well as an introduction to LLVM IR.
Also, there are a few books about LLVM out there:
- LLVM Essentials, Sarda et. al, 2015, Packt Publishing.
- Getting Started with LLVM Core Libraries, Lopes et. al, 2014, Packt Publishing.
- LLVM Cookbook, Pandey et. al, 2015, Packt Publishing.
I must admit that I have not yet read any of these books, but I have listed them in the order I think they should be read.
Layers
There are two layers that you can use when you want to work with LLVM:
-
LLVM IR as a textual representation which is input to the LLVM tools. This is the method I use because I don’t want to be bound by internal API changes and don’t want to have to relink and republish my work whenever a new version of LLVM is published.
-
LLVM bitcode which is a binary representation of LLVM IR. The relationship between LLVM IR and LLVM bitcode is roughly like the relationship between assembly source code and an object file.
Tools
There are a bunch of tools in the LLVM tool-chain, but you can do most simply by using the C language frontend (clang
), by specifying one of the desired input extensions such as .bc
(bitcode), .ll
(LLVM IR), and so forth.
Tips
-
Initially, I’d suggest using LLVM IR as the output of your compiler and then invoke
clang
to translate LLVM IR into LLVM bitcode. This is much easier to work with and reduces the impact of internal changes to the code base (LLVM is very actively developed). The C++ APIs used to generate LLVM bitcode with tend to change quite often and sometimes quite drastically. There is also a C API, but last time I checked it (some years ago), it offered only a fairly small subset of the C++ API. -
Don’t bother with Static Single Assignment (SSA) form initially. Computing the proper temporaries using SSA is quite difficult and even the LLVM samples warn against doing this. Instead, generate code that uses the
alloca
pseudo-instruction to allocate storage on the stack and let themem2reg
pass figure out how to convert this into SSA. I started out without knowing this and therefore wasted some time on trying to compute the proper SSA form, only to realize that the LLVM documentation itself warns against doing so. -
If you have trouble figuring out how to do something with LLVM, the easiest is to write a tiny C or C++ program that does what you want and then translate it to LLVM IR using this command:
clang -fno-asynchronous-unwind-tables -fno-exceptions -fno-rtti -Wall -Wextra -masm=intel -O3 -S -emit-llvm -g0 $1 > $1.ll
-
Don’t waste energy on writing a Runtime Library (RTL) in LLVM IR. LLVM IR is meant to be generated because of the SSA form. Hand-writing LLVM IR is pretty tedious and tiresome.
-
In my experience, it is much easier to develop against LLVM on Linux. The Windows support is fairly complete, if not complete, but you need to install Microsoft Visual Studio (a no-go in my world) whereas you can install LLVM and Clang on Ubuntu Linux just using
sudo apt install clang-9 llvm-9
orsudo apt install clang llvm
, depending on your version of Ubuntu Linux. -
Make sure you don’t use
readnone
as an attribute on your functions, unless they don’t read memory, as the LLVM tools generate invalid code, with no warning, if your code does access memory even thoughreadnone
is specified. -
LLVM does not natively support Unicode so you have to output UTF-8/UTF-16/UTF-32 values as byte values. This is by far the weakest point in LLVM, IMHO.
Examples
There are very good examples on the LLVM website. I recommend studying at least the Kaleidoscope sample before you start out on LLVM.
Braceless
Braceless is my name for my hobbyist programming language, which is unlikely to ever become a usable product. But it can be used to see an example of how I have made use of LLVM v8+ by generating LLVM IR from a Python script and how I invoke the LLVM tools to translate the generated LLVM IR into an executable file. Currently, not much is going on publicly on the Braceless project, but I am working on it now and then. I am in the middle of a very large refactoring project so updates are postponed until I’m finished with that.
One way to start out with LLVM would be to clone my Braceless GitHub project and try to get it going on your system. As far as I recall, the version on GitHub does generate a valid executable (it probably core dumps), but it would give you something to start out with.
Domain-Specific Languages (DSLs)
Can LLVM potentially be used for DSLs? I believe so, although I have no experience in this area. LLVM can be linked into any C++ application and the Just-In-Time compiler be used to generate native code in the running process so that a DSL could potentially be translated into native code and run without any external compilation step. Please add your views and/or experiences to this thread.
Epilogue
I’m willing to edit and update this reply if anyone asks questions or offers suggestions. I’m confident that the above is lacking a lot, but you have to start somewhere.