Beagle Update: Architecture Overview, Performances, and Future Marketing Plans

alcouch65 · May 15, 2020, 5:18pm

Hello, everybody! Hope everyone is doing alright in these times. Just a fair warning, the following content is a rather advanced and rather technical content.

My job has been put on hold due to our investors being on standby so I’ve had more time to work on Beagle. I’ve made lots of progress on Beagle. First thing I wanna talk about is the general architecture. I’ve taken an asynchronous pipeline approach where every stage has a global manager that has a thread pool. For every module loaded up in a set of given paths, they are given their own threads in that thread pool. Each of the stages for that module have their own data channels where the preceeding stage emits its own internal representation to the next stage.

Lexer - tokens -> 
    Parser - hir -> 
    Symbol Resolution - hir -> 
    Type checker - tir -> 
    Memmy -> mir ->
    LLVM -> binary

When the lexer/tokenizer scans the input, it constructs tokens using lexical rules. Such as keywords, literals, delimiting tokens such as =, {, ), etc. When it produces a token it gets passed through the channel to the parser, where it is constantly receiving the next token and parsing and changing state on the fly. The result is HIR or higher intermediate representation. It’s chunks of bytecode produced by the parser, and it’s an unmodified, in-memory representation of the input. The TIR is type-checked intermediate representation, which is a transformed form of the HIR where the types have been processed. This is where abstract types will be converted and traits are merged into their implementing structs, and the impl and override modifiers on abstract members are checked and erased. The MIR is middle intermediate representation or memory-related intermediate representation and is completely alien compared to HIR and TIR. MIR is produced by Memmy or the smart memory management analyzer and code generator. Memmy produces an internal reference graph and does complex compile time analysis on objects to determine when to generate object init/drop functionality. It also generates virtual tables for abstract classes and interfaces, and generates trait flags for struct traits. There was at one point a way of viewing the MIR in console but since then I’ve changed the way that the IR for both HIR and MIR work but here’s what it used to look like:

Now, some might think that this async kind of architecture could introduce difficulties with error recovery and reporting, however, this requires a synchronized swimming technique for send messages to the manager when a problem occurs. If a problem occurs in a module, that module’s thread will send a message to the manager and the manager will commence a shutdown process that is also signaled to the rest of the compiler. I could go into it even deeper, but maybe another time.

The aforementioned process is not completely implemented but the majority of it is, and it’s resulted in high performance due to the fact that type checking is being done while the module might still be tokenizing and parsed at the same time. The following input and video demonstrates how fast it is. It is a rather small example but still demonstrates it’s speed. The output is the type checker’s representation of the input.

var hello: String = "Hello"

fun testFunc(test: String) {
    let time = 1100 - 20 + 10
}

fun anotherTest(test: Int, yo: Bool): String{
    let name = "alex"
}

The type checker uses this as a beefier representation that is unique to type checking. These objects also have built in methods for efficiently and cleanly doing various type checking operations.

Now that we’ve gotten that out of the way, I wanna talk about how I plan to market Beagle when it’s ready. One thing I’ve noticed about many new langs these days is that there is a significant period of a lang’s lifetime where it is almost inaccessible or unusable. Some examples are like JAI being closed sourced and only released to a select group of individuals. Zig is open sourced and released but it’s documentation on getting started is quite lacking so for those that are new to a language like zig struggle significantly. I did try it out at one point and eventually gave up because I could barely find anything basic on something as trivial as arrays. While it is a nice simple language for writing C like code in a robust way, and while it is accessible, it’s pretty much unusable to most people at this point. Crystal is another nice language I look upon for resources, but unfortunately it’s not completely available on windows which is a bit of a bummer for people willing to give it a try, where it’s mostly usable and accessible on linux distros. Kotlin was not that usable or stable for the first 5 years of its lifetime if I’m not mistaken. Rust until 2017/2018 was a beast to work with due to its strict memory management model. I do love rust but I don’t want beagle to have any of these kinds of hiccups or delays. I want as many people to be able to use it right off the bat with the first official release of it.

The first pre-alpha release is what I call “The Big Functional Update” (yes, update names take after minecraft), where it provides the basic language features plus things you need to write highly functional code from higher order functions and lambdas with function types to sugary control flow like unless, until, loop, and the rest of the basic syntax. It will only have functional features at the start but I want to give it a decent yet basic build system with a very basic form of a build script where you can just use for basic configurations.

const name: String = "test"
const group: String = "couch"
const version: String = "1.0"

const dependencies: [String] = ["something:1.0", "another-thing:1.0"]
const outDir = "out/test"

It will not be anything like Gradle, and quite frankly, I don’t like gradle. Beagle build scripts will be much simpler and easier to work with; straightforward. I want anyone to be able to try it out every step of the way. Every version I release, I want it immediately available for use, so that even if it’s pre-alpha, anyone can write beagle code for whatever they want with the guarantee that they can always stay up to date with ease.