Techniques for Tree Simplification

hetianmu · December 23, 2022, 11:13pm

Greetings all - first post here…

I have used ANTLR to parse a custom language that consists of comparisons (e.g., ABC >= 123) connected with parentheses, AND, and OR. In my abstract syntax tree, I drop the parentheses and I have AND nodes, OR nodes, and comparison nodes, where the AND and OR are binary and comparisons do not have children.

I have an ANTLR context where some nodes are grouped together, because those are meant to translate down to one node, in my abstract syntax tree. This one node can derive from between one and six comparison nodes, plus AND and OR nodes, and the resultant node will vary according to the combination of these context nodes.

The trees generally consist of less than twenty nodes (i.e. comparisons). Some exceed a thousand, but don’t require the aforementioned node consolidation

I am using the ANTLR C# library. Questions are:

Is this consolidation something that can be solved with ANTLR’s ParseTreePatternMatcher in C#?
What would a “simple” solution look like? Keeping a stack of nodes within the context and checking against a list for the longest match, then attaching the new node to the parent of the context?
Is this a problem for a “term rewriting” system? I’m not familiar with these so just asking in case it is relevant.

If anyone can share there experience I would be grateful - of course!

drazen.dotlic · January 3, 2023, 3:12pm

Hey,

I’ve done my fair share of tree rewriting in .NET and here’s what I know:

Thanks for pointing this out - had no idea it existed! It doesn’t sound as useful to your use case, since it allows for finding nodes, but not navigating the (syntax node) tree IIUC.
IMO, the simplest (but hardest) solution is to first transform the concrete syntax tree - it’s what the ANTLR produces when it parses the input - into an AST. This is not hard to do using the base visitor produced by the ANTLR itself. Once you have an AST I find it much easier to manipulate because so much “noise” has been stripped. You can traverse the AST any way you like, but something simple where you carry a state object (into which you store whatever “context” you need) into each method call which is called for each AST node (starting from the root) worked very well for me. The biggest limitation is that it’s typically hard to explore nodes up in the hierarchy. Multiple passes remedy this to an extent.
Not sure if term rewriting is the correct name for the technique. Your best bet, if you aren’t too experienced and tree rewriting is too low level for you, would be to find some code which does portions of this work for you. You’re in luck, there exists an Open Source library from Strumenta which they call StarLasu, a family of parsing helpers in several languages, including C#. You should note that the C# “port” (they started with Kotlin) seems to be one of (in not the most) recent additions to the family and may be missing functionality compared to libraries written in Kotlin or Java or JavaScript, but it will surely give your parsing a boost. There was even an online presentation about it recently which I believe was recorded; that would get you up to speed even faster I presume.

Hope this helps and happy parsing!

hetianmu · January 4, 2023, 1:39am

Thank you for taking the time to share your expertise!

I’ll try your idea to carry a state object during the AST walk.

Will be sure to check-out StarLasu code, too!