Compile into the hardware, or ... the hard side of compiling

ldesantis · September 13, 2021, 8:30pm

Hello everyone,
the proliferation of JIT compilation technique in many fields of application, makes me wondering if there is some way to accelerate compilation tasks by dedicated hardware. This is only a first thinking about the topic, I can’t find too much literature on it.

Hardware acceleration fundamentally means:
a. Parallelize tasks as much as possible (more than multi-core)
b. Synthesize specific operators to speed-up computations (like matrix multiplication, for instance)
c. Minimize data transit between processing units and storage units

In this post, I will talk about point a. only.

Let’s see how basic compilation phases can be improved by HW parallelism.

1.Lexical Analysis.
Lexical analysis is based on the finite state machine model, that works on a strictly serial execution.
But many languages allow multiple rules being partially matched by one string; so, if rule-set can be modeled by an NDA (non-deterministic automata) we can think to associate one HW state machine for each rule.
Again, many languages allow partitioning of source code in a set of class/functions so that we can think to analyze multiple pieces of code on multiple pieces of HW.
The same is possible when code is partitioned in multiple files.

2.Parsing.
Parsing of CFGs is based on the model of a state machine plus a stack. Also in this case, many languages allows multiple rule to be partially matched, so we can associate one
piece of hardware to one parsing rule and running them in parallel. Stack’s depth is a critical point in designing such a hardware.

3.Traversing AST
Introducing parallelism in this task is really complex, in my first level of analysis.

4.Symbol resolution
Searching a key in a lookup table is highly parallelizable.

5.Type check
In type check of binary expressions, left and right sub-trees can be analyzed in parallel.

6.Optimization
No ideas

7.Code emission
One can think to partial parallel emission of single function/class, and merging everything at the end

8.Interpretation
If the final code is expressed in terms of data-flow, there are many algorithms to synthesize code on a parallel hardware

The first comment is that compiling into the hardware is really challenging !!
Any further comment/feedback is welcome.

Thanks
Luca

Niko · September 14, 2021, 7:44am

Customized hardware is very expensive – I vaguely recall a minimum of 400.000 units (of something like a washing machine controller) to make any economic sense. Thus, another approach might be: Which part of compilation do we spend most time on, so it would be worth to be accelerated in hardware?

meinte.boersma · September 14, 2021, 8:35am

Not an actual answer to your question, but what about making incremental compilation better so that it’s good enough after an initial (potentially slow) compilation?

cdrews · September 17, 2021, 9:21pm

I don’t think I quite get the use case you have in mind. I can see how type b acceleration could work, but beyond fairly basic operations, a dedicated CPU core will in most cases beat specialized hardware in bang for the buck. Why do you feel compilation in general, and JIT compilation in particular, is a good candidate for hardware acceleration?

ldesantis · September 18, 2021, 7:19am

@Niko @meinte.boersma @cdrews thanks for reply. This is only a free thinking on a technical topic, probably makes no sense .
For sure there is a great interest in big-data applications to any kind of acceleration. There are services of FPGA-on-the-cloud, where you pay only for processing time on the FPGA device, not for buying silicon, so the cost is not a real problem.

I don’t think honestly that compilation is the bottleneck, given all other sources of latencies (networking, processing…) so this is just an exercise. I was surprised to know that in big-data applications there is a massive use of JIT compilation, this fact triggered my attention to the topic, but, I repeat, probably it’s not so useful in practice, and not easy to do.
Thanks
Luca

cdrews · September 21, 2021, 5:42am

I wouldn’t say it makes no sense. In the very least, it keeps you entertained. I agree that FPGAs should work to some degree, although the available clock speed would probably limit performance quite a bit in some stages. My first guess is there is at least some niche use case for hardware compilation. Maybe instead of looking at JIT, you could consider build farms where latency is less of a problem?

ldesantis · September 22, 2021, 8:41pm

Hi @cdrews could you please elaborate more your last question? I didn’t get completely what you are referring to…
thanks
Luca

cdrews · September 29, 2021, 5:52am

Sorry for the slow response. If I understand correctly, the hardest parts you have identified so far are:

the latency between accelerator and the CPU
parts of the compilation pipeline are very hard to implement in hardware
So at first glance, it seems to me like you could focus on the parts that are less hard to implement in hardware first, and expand from there. That probably wouldn’t do any good for JIT due to the latency, but I don’t think that applies to “regular” compilation. Hence, I was wondering if it might be better to look at the problem through the lens of large scale compilation like it happens in build farms. What are your thoughts on that?