Strategies for producing better syntax error messages with ANTLRv4

iklask · November 5, 2021, 6:11pm

Hello All,

I’m new to ANTLR and parsing in general. I have been working on a backend for a Language Server and a transpiler for a simple scripting language with ANTLR. What I’ve just started is looking into producing better error output when the user types invalid syntax (via IDE or at compile time).

I was wondering if anyone knew of resources that would help a newb like me out with patterns and strategies for creating smarter syntax error messages from an ANTLR parser/lexer. The messages from the default ErrorListener are decent for debugging the ANTLR grammar, but not useful enough for a compiler/IDE error reporting. I’d like something that users can understand without knowledge of ANTLR and my language grammar.

All I know at this point is to create & use my own ErrorListener and override the syntaxError() method. Inside the syntaxError method is where I should launch my logic for generating a smarter error message when given the lexer/parser, offendingSymbol, and symbol line number/column number.

mike · November 12, 2021, 4:05pm

It’s going to be a bit difficult to do much better than the ANTLR default messages in a generalized fashion. After all, ANTLR is pretty much providing all the information it has access to. You may be able to re-word the message in your own listener but you’ll be limited to the information provided in the call.

There is enough information in the call that you might be able to identify specific error conditions, and override the message with something specific. This could be a lot of work over time, but you might make headway with incrementally identifying errors and providing a more “friendly” message.

Generally, the ErrorListener is useful in allowing you to collect the information about syntax errors encountered and provide them in whatever user interface is most appropriate for your application.

Generally, once you’ve got a parse tree (and encountered any syntax errors), you’ll write a listener/visitor to do semantic validation. Those error messages are entirely at your discretion. It might be that, in that process you have more context to create more user friendly versions of the syntax errors as well.

Finally… a clever “trick” that I’ve seen… If you have a fairly common mistake that your users make, and you find the default error, obtuse, you can add a parser rule to your grammar to “recognize” that construct. When you do this, ANTLR will recognize it and provide a node in the parse tree without throwing a syntax error (after all, you’ve told it how to recognize it). Then in your listener/visitor, if you encounter that parse tree node type, you know that you’ve caught that specific error and can provide as explanatory a message as you’d like. This is a bit counter-intuitive because you’re putting “invalid” rules in your grammar, but there’s no rule against that. You’re telling ANTLR how to recognize a situation, and you can treat any of those nodes as errors.

iklask · December 4, 2021, 2:08am

Thanks for this reply! Sorry for being so late following up, but this is really helpful info. I put off working on this part while dealing with other projects. I’ve noticed some people have extended the DefaultErrorStrategy so they can change the message that is generated when an error is hit.

I might use your “trick” with defining illegal syntax within the grammar. I found this blog which uses a similar strategy by invoking NotifyErrorListeners within the grammar itself.

That being said, I still have a lot to decide on when it comes to catching lexical and syntactic errors. Its hard to imagine a generic approach when the antlr parser does a lot of this behind the scenes!