👨‍🔬Alpha #4: garbage collection and golden testing

This week in Alpha was a short one. I had courses to visit, so I only had three days to code. Anyway, I finished garbage collection, refactored the code, and set up the test infrastructure and CI.

Shadow stack

The last week was finished with shadow stack working for Rust code, but not code generated by Alpha compiler.

Adding shadow stack support to the compiler yet again turned out to be more complicated than I imagined.

First, the shadow stack format from the previous post is a bad one:

It shows a shadow stack as a separate array of pointers to variables on the stack. This works, but it doubles the stack usage and adds more runtime overhead (filling out the new array). Ideally, you should allocate a single array of GC pointers on the stack, push it as a GC frame, and use individual entities as GC roots.

To allocate a single array, you need to know how many GC roots a function has. This requires either writing an analysis on AST (something the current AST representation is not well-suited for), or writing an LLVM pass to collect all GC roots in a function, allocate an array and patch other code to use it—doable though obviously not a quick fix.

If only LLVM could do that for me…

The last week I wrote that I couldn’t get LLVM shadow stack GC working, and it failed for me with the following error:

LLVM ERROR: unsupported GC: shadow-stack (did you remember to link and initialize the CodeGen library?)

Turned out, GC strategies are not linked by LLVM C bindings. The solution is simple—create a small C++ shim and link the appropriate function. The function is a no-op, but it ensures all relevant statics are linked.

#include <llvm/IR/BuiltinGCs.h>

extern "C" void LLVMLinkAllBuiltinGCs() {
  llvm::linkAllBuiltinGCs();
}

After that, the shadow stack works as expected.

Shadow stack GC strategy is simple, and there is still room for improvement. For example, Julia has a couple of passes to minimize the number of GC roots allocated—it analyses roots liveness and reuses a single GC slot for multiple values.

Another option is to embed GC roots information into the stack frames directly. This way, the function needs less allocation and does not need to maintain the second stack—much lower runtime overhead!

Anyway, these are the options for the future. For now, roots tracking works well enough!

Garbage collection

The current garbage uses Cheney’s algorithm for garbage collection, a variation on the semi-space copying garbage collector.

It maintains two memory blocks: “to” and “from.” Only one of which is in use at any one time. New objects are allocated in the “to” block. When garbage collection is triggered, the blocks are swapped, and all live objects are copied over to the new “to” block. After that, the “from” block can be mprotected, so all reads are caught.

The “from” page can be reused on the next garbage collection, becoming the new “to” page. Alpha doesn’t do that but discards the page and allocates a new one. The new page is guaranteed to be zero-initialized, which is helpful. I believe requesting a new zero-initialized page from OS might be faster than memsetting the old one, though I haven’t tested that.

There are lots of ways to improve garbage collection. First, it needs to support growing memory (current regions are fixed-sized). Next, the garbage collector should become generational. Because objects in Alpha are immutable (should be), old objects never point to newer objects. This means that you don’t have to scavenge old generations when collecting a new one.

Testing

Compilers are big beasts that are hard to test.

They use large data structures (e.g., AST) that are tedious to construct manually. It’s hard to verify generated code because optimizations can change it without changing the program behavior. And there are way too many possible cases to test.

You can write unit tests, but the predominant way of testing a compiler is end-to-end tests—writing thousands of small programs/examples, then compiling and running them for every new compiler version. This is often called “golden” testing: there is an expected—“golden”—output for each test.

Alpha now has this, too (though it’s not thousands yet). And I even caught a regression in multi-methods implementation! Now I can rest assured it won’t slip unnoticed again.

Here is how a test looks:

fn f(x, y) = 0

fn f(x, y: i64) = 1

print(f(0, 2.0)) # 0

print(f(0, 0))   # 1

# expected stdout:
# 0
# 1

The tests can also serve as examples—you can browse them at alpha/examples.

Oh, and Alpha got CI, too—all of the tests run automatically on every push.

Backlinks