Hey everyone! I’ve decided lately to move away from abstract syntax trees and instead work with bytecode. I’ve decided to following suit with the Rust compiler with its multiple layers of bytecode and do the same. Although, I’m having some trouble with writing reliable, robust, maintainable bytecode generators. I am wondering if anybody knows of any libraries that may exist for this. I doubt there are cause not enough people are working with bytecode in Java or Kotlin to warrant some kind of bytecode api. If there isn’t, I am thinking about making a Kotlin DSL for this as it would be important for Beagle’s compiler.
I’m thinking it could have an API for designing the bytecode in a way that allows you to decorate its metadata in the engine, as well as a means of creating “plugins” for input and output. The input would be “what are we using as a source or frame of reference for generating bytecode”. The output would be “what are we doing with the generated bytecode”. I am wondering if this would be a worthy prospect for the strumenta community. I am still working out the details but I have created an example of what I am imagining. I’ll make a proof of concept because I kinda need it ASAP lol.
The point of this is for more efficient code generation and code evaluation. This would be extremely useful for linters and language servers, virtual machines, complex compilers, and much more. I think a teamwide effort to make this would be an amazing prospect but I am wondering how realistic or in-demand something like this currently is. Even if the demand isn’t necessarily high right now, I’ll still make a proof of concept.
This would help me out with making my bytecode layers because I have so far made two layers for my experimental compiler and it was hell to debug it. Although I was able to produce the following output for the experimental compiler. What you are about to see is the first layer of bytecode “stringified” in a textual representation. This is how I was able to ensure the HIR generator was working for the most part.
Source:
let hello = "Hello";
fn add(x: Int, y: Int): Int{
return x + y;
};
fn sub(x: Int, y: Int): Int{
return x - y;
};
fn mul(x: Int, y: Int): Int{
return x * y;
};
fn div(x: Int, y: Int): Int{
return x / y;
};
fn main(args: Int): Int{
printf(hello);
return 0;
};
HIR Bytecode Textual Format
file 'llvm@C:\projects\kotlinx-llvm\toylang\run\files\test.toy'
global.var hello = string "Hello"
global.fun add(param.x type.Int, param.y type.Int) type.Int{
term.return op.add ref x, ref y
}
global.fun sub(param.x type.Int, param.y type.Int) type.Int{
term.return op.sub ref x, ref y
}
global.fun mul(param.x type.Int, param.y type.Int) type.Int{
term.return op.mul ref x, ref y
}
global.fun div(param.x type.Int, param.y type.Int) type.Int{
term.return op.div ref x, ref y
}
global.fun main(param.args type.Int) type.Int{
call.printf(ref hello)
term.return int 0
}
The code for these generators are honestly really hard for me to look at. I feel like it should be easier to do so. I have a feeling I’m gonna be doing this kinda stuff a lot and I really encourage other lang devs to follow suit as bytecode is linear in memory and more concise as opposed to AST, being scattered and clustered in memory naturally. AST is also really hard in a shared environment, such as a language server. I really think this could be a great endevour.