Hello
I will try to keep this short.
We try to apply clean code principles all the time.
Sometimes we succeed sometimes not so much.
However, when working on a big and complex project such as an IDE Code completion engine,
divide et impera, single responsibility, open close, dry etc are needed more than ever.
That being said, let us consider a small snippet of code:
1 void Test()
2 {
3 int index = 0;
4
5 if (condition1)
6 {
7 int count = 0;
8
9 where (condition2)
10 {
11 bool testing = false;
12 }
13 }
14 }
Now, here we have three code sections.
A method, two child sections if and where.
This code will translate to an AST that might look like this:
Test ( method node )
------------> = ( operator node )
/ \
( lhs ) index 0 ( rhs )
- [int]
-----> if ( if section node )
------> condition1
---------> = ( operator node )
/ \
( lhs ) count 0 ( rhs )
- [int]
-> where ( where section node )
-----> condition2
-----------> = ( operator node )
/ \
( lhs ) testing false ( rhs )
- [bool]
===============================
lhs = left hand side operand
rhs = right hand side operand
===============================
Using recursive descent we will be able to traverse this ast tree and execute / compile the code.
We are interested in generating the AST so i will not insist on traversing the AST here.
We need three types of parsers here
- a method parser
- an if parser
- a where parser
Ok. Why three parsers ?
Single responsibility.
The project is complex, later on, on debugging things could get realy ugly.
- When compiling, we extract a stream of tokens for the whole code section
void
Test
(
)
{
int
index
0
;
if
(
condition1
)
{
int
count
0
;
where
(
condition2
)
{
bool
testing
false
;
}
}
}
Now let’s take a look at the state machine inside a parser.
We start by declaring a set of states ( let’s say, inside an enum )
enum MethodParserState
{
VoidKeyword,
OpenParamsParanth,
ParamType,
ParamName,
InsideMethodBody
AccumulateIfTokens
AccumulateWhereTokens
…
}
This is just an example to show the process not a really working piece of code!
============== MethodParser ====================
int openParanths = 0;
int openBraces = 0;
ArrayOfTokens[] = [ the tokens extracted from code ];
void Parse()
{
for ArrayOfTokens → foreach token in array call CurrentMethod with token as argument
the token will be sent to the method that is set at the moment as the processing state method
==
the first token in the array will go into the VoidKeywordState method
}
void CurrentMethod = VoidKeyword; // default state
For each enum label we declare a method to handle the state
void VoidKeywordState(Token token)
{
if not ‘void’ token then throw an exception
else CurrentMethod = OpenParamsParanth
}
void OpenParamsParanthState(Token token)
{
if not ‘(’ token then throw an exception
else CurrentMethod = ParamType
}
…
void InsideMethodBody(Token token)
{
// again, this is just an example
switch (token)
{
case if:
== in a top to bottom compilation process, here we start accumulating tokens
until we reach line nr 13.
AccumulatedTokens.Add(token)
CurrentMethod = AccumulateIfTokens
case where:
== same thing for a 'where' section
AccumulatedTokens.Add(token)
CurrentMethod = AccumulateWhereTokens
...
}
}
void AccumulateIfTokensState(Token token)
{
switch(token)
{
case ‘(’:
OpenParanths ++;
AccumulatedTokens.Ad(token)
case ‘)’:
OpenParanths –
AccumulatedTokens.Add(token)
case ‘{’:
OpenBraces ++
AccumulatedTokens.Add(token)
case ‘}’:
OpenBraces –
!!!
if OpanBraces == 0 // end of section reached
== we would pass the tokens from line 5 to 13 to a new if paser instance
IfParser = new IfParser()
IfParser.Parse(AccumulatedTokens)
... handle the response from the if parser or the resulted AST node
continue parsing the method
===============================================
once the if parser has finished processing the tokens
this parser ( the method parser ) goes further with the process
... we set a new state and keep processing
CurrentMethod = InsideMethodBody
default:
AccumulatedTokens.Add(token)
}
}
void AccumulateWhereTokensState(Token token)
{
… same as in the if section
}
=======================================
Again, this is just an example.
For this kind of processing, there are multiple sections where the parsers stay in the same state and accumulate tokens.
If we process files it is not a problem.
However if we a typing code things change.
With a code completion engine, there is no longer a stream of tokens.
As soon as it is created, a token is sent to a parser.
So each token 'moves' the parser to the next state.
So if we type code and we reach line 5 ( the beginning of the if section )
we create a new if parser there and we start sending the tokens to the if parser.
When the if parser reaches the last state it is no longer used and we get back to the method parser.
Imagine this is a stack of parser instances
ParsersStack
– we start typing and we produce the token ‘void’
the beginning of a method has been detected.
A new method parser is created and pushed onto the stack
MethodParser <---- the tokens from the keyboard go into this parser now
ParsersStack
– we keep typing and reach line 5 ‘if’ keyword
the beginning of an if section has been detected.
A new if parser is created and pushed onto the stack.
IfParser <---- the tokens from the keyboard go into this parser now
MethodParser <---- the method parser has preserved it's state !!
ParsersStack
– we keep typing and reach line 9 ‘where’ keyword
the beginning of a where section has been detected.
A new where parser is created and pushed onto the stack.
WhereParser <---- the tokens from the keyboard go into this parser now
IfParser <---- the if parser has preserved it's state
MethodParser
ParsersStack
– we reach line 12 - the end of the where section
We pop the where parser from the stack
IfParser <---- we are back at the state the parser was when the where parser was created
MethodParser
ParsersStack
– we reach line 13 - the end of the if section
We pop the if parser from the stack
MethodParser <----- we are back at the state the parser was when the if parser was created
ParsersStack
The conclusion is, you must restore states from one parser to another and this is done with the help of the stack.
And all this is needed because with code completion - there is no longer a stream of tokens instead a single token at a time has to be processed -
This would be the top layer of the engine.
While we type, each parser can resolve types, conflicts, apply syntax color or highlight errors.
On a dot ‘.’ keypress , or any other for that matter, a parser can use it’s current state, the last token and decide what completion list to show.
While we type, the AST is created on the background and ‘go to definition’ ‘rename’ ‘refactor’ ‘show references’ begin to take shape.
Again , everything starts from the fact that with code completion we no longer have a stream of tokens.