Performance when parsing multiple if else statements

barmstrong · October 26, 2020, 6:46pm

I’m using Antlr4 to parse code that looks like

if
else if
else if
…
else

The code parses fine, it’s returning the correct tokens, but the performance is very poor. I believe that every time it goes into an if, it has to do a lot of prediction.
This is in python. I know other languages would be faster, but, to the best of my knowledge, not significantly faster, and at the moment it’s significantly slow.
I guess my question is really, should this code be taking a long time? Are nested if else statements really time consuming, or do I need to just tighten my grammar?

Thanks!

alessio.stalla · October 27, 2020, 4:47pm

In my experience, if it takes a lot of time, refactoring the grammar usually helps. The runtime makes a difference too, but since the algorithm is always the same, in theory, it should only differ by a constant factor. In practice, a given runtime may have bugs or some patterns of execution may be particularly slow in a given language. You should experiment! It’s fairly easy to port your grammar to another target and compare performance over the same inputs.

Also, ANTLR4 “warms up” and becomes faster the more you use the parser, so you may want to run it on a few selected examples before feeding it actual input.

CodingFiend · October 29, 2020, 7:31pm

In my long experience, nested IF statements are not slow at all. The key trick that experienced programmers do, is sort the IF statements in the order of their probability of occurring. If you have 4 IF clauses and one else clause, and clause A has probability 0.1, B has 0.001, C has 0.0004 and D has 0.65, then you put them in the order D, A, B, C, so that the computer has less work to do. Another optimization that is frequently done is that if a whole bunch of the clauses are inside some range, like negative values, you put in an IF statement first to quickly exclude a whole set of conditions.

The only time i see slow IF statements, is when the IF clauses are repeatedly calling some function, like IF longfunc(A) == 3 ELSE IF longfunc(A) == 4, .etc., you should use a temporary variable so that the IF clause value is only computed once.

I realize that these optimizations are not at the compiler level, but at the programmer level, but that is generally where the problem lies; it is bad programming that causes inefficiency; the compilers do their best, but they can’t do some of these optimizations without causing side effects; maybe the longfunc(A) is doing some updating of extern state, and reducing it to one call would cause a malfunction.