👨‍🔬Alpha #3: dynamically-sized types and almost-finished garbage collection

This week in Alpha was an exhausting one. Alpha got new dynamically sized types (Symbol, String, SVec) and an almost-finished garbage collection. The types were a quick add, but then most of the week was spent debugging garbage collection, segmentation faults, and memory crashes.

SVec

SVec is a small vector—an immutable length-prefixed array of pointers.

It’s awful performance-wise—adding a single element is O(n)O(n) (compared to amortized O(1)O(1) for vectors in other languages). Lookup is O(1)O(1) though.

The main advantage is that it’s simple, persistent, and allows building more sophisticated data structures on top. (I will cover some cool immutable vectors in the future.)

String

The String is similar to SVec, but it is an immutable length-prefixed array of bytes encoding a UTF-8 string. The string is both length-prefixed and NUL-terminated—this will allow easy interop with C in the future.

Symbol

Symbols are very much like strings. The only difference is that symbols are interned (aka hash-consed)—two symbols with the same name will always resolve to the same pointer. It provides for a fast comparison—it’s just a pointer equality check.

Interning is implemented by storing all symbols in a binary search tree sorted by string hash. The tree is never re-balanced, but that is fine because hashes are uniformly distributed, so the search is still O(logn)O(\log n) (because the tree is expected to be balanced for nn \to \infty).

These new structures allowed me to update definitions of DataType and Method, so they no longer contain Rust-specific types (Vec, String). This, in turn, allows them to be allocated on the GC heap and be traversed by the garbage collector as normal values.

This leads us to the next topic…

Garbage collection

Regardless of the garbage collection algorithm, the first step is determining which objects are still reachable and therefore alive. The search starts with GC roots—pointers into GC-managed heap that are not themselves stored on the heap. These are global and stack variables.

Two major approaches to reachability analysis are conservative and precise.

Conservative algorithms are simple—they traverse all global variables and the stack and treat anything that could be a pointer as a live reference. It might produce false positives (e.g., an integer holding a value similar to a pointer), but it never misses live references. The primary benefit is that conservative algorithms can be implemented without any compiler backend cooperation—the algorithm doesn’t need to know the layout of the stack or where variables are. Boehm GC, the most popular plug-n-play garbage collector, is conservative.

The downside is that conservative garbage collectors cannot relocate objects. Therefore, you cannot make a conservative compacting or copying garbage collector.

Precise algorithms only traverse known pointers. Therefore, they need exact information on where pointers are located in the stack and the objects’ layout (where pointers are stored in each object).

Precise algorithms are harder to implement, but they have no false positives and allow objects relocation.

Traditionally, precise algorithms required compiler backend to provide memory layout for stack frames—where pointers are located in the stack—which is hard to set up when using third-party or multiple backends. However, there is a backend-independent technique that can provide this information—shadow stack.

Shadow stack

Shadow stack is a technique of maintaining a linked list of frames that mimics the call stack but contains the information you need (e.g., location of pointers). This adds runtime overhead, but the technique is cross-platform, doesn’t require backend support, and is easy to implement.

The idea is simple—whenever you have local variables or intermediary results that point into the GC heap, you store their addresses in an array and push it to a thread-local list of frames. When you leave the function, you pop the frame from the stack. This way, the GC can get addresses of all variables that’ll need patching when values are moved. (The following paper describes the technique in more detail: Accurate Garbage Collection in an Uncooperative Environment.)

[2021-11-28 Sun]: This is a bad representation for shadow stack. See Alpha #4: garbage collection and golden testing for updated picture.

LLVM claims to support shadow stack, but I couldn’t get it working. I get the following error:

LLVM ERROR: unsupported GC: shadow-stack (did you remember to link and initialize the CodeGen library?)

After a bit of googling, I’ve found zero examples and basically zero usage in other languages. Even Julia (which I heavily copy draw inspiration from) does not use LLVM GC support and reimplements shadow stack from scratch.

GC implementation process and debugging tips

Introducing garbage collection is tough—you have to carefully annotate all your functions to properly maintain the shadow stack, and you have to debug the garbage collection algorithm so it does not leave broken pointers.

Debugging garbage collection is tough as errors manifest as spooky action at a distance—segmentation faults, random memory changing its value, assertions failing, etc.—things you’re not accustomed to seeing in Rust (or any language, really).

That’s why you want to add GC as early as possible. Maintaining it afterwards is an incremental effort.

Here are some GC debugging tips. They fall into two categories: getting more info on what’s going on or catching errors earlier.

Use debugger to see backtraces

Segmentation faults are not your regular language exceptions—they terminate the application without showing you a backtrace or anything useful. Use a debugger to get the backtrace and examine variables around.

Tracing and memory dumps

Print backtraces for every allocation (so you know which function is allocating the memory), trace the garbage collection process, and dump memory buffers to debug what’s going on.

 TRACE alpha::gc                >   from@0x7ffff3855000: Length: 472 (0x1d8) bytes
0000:   18 32 98 57 55 55 00 00  00 00 00 00 00 00 00 00  d8 30 98 57 55 55 00 00  08 00 00 00 00 00 00 00
0020:   a8 68 f8 57 55 55 00 00  18 60 86 f3 ff 7f 00 00  00 00 00 00 00 00 00 00  08 50 85 f3 ff 7f 00 00
0040:   00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00
0060:   d8 30 98 57 55 55 00 00  ff ff ff ff ff ff ff ff  78 6a c4 57 55 55 00 00  38 31 98 57 55 55 00 00
0080:   01 7c ff ff ff 7f 00 04  08 50 85 f3 ff 7f 00 00  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00
00a0:   00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  10 51 85 f3 ff 7f 00 00  d8 30 98 57 55 55 00 00
00c0:   08 00 00 00 00 00 00 00  28 73 f8 57 55 55 00 00  18 40 86 f3 ff 7f 00 00  00 00 00 00 00 00 00 00
00e0:   08 50 85 f3 ff 7f 00 00  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00
0100:   00 00 00 00 00 00 00 00  d8 30 98 57 55 55 00 00  00 00 00 00 00 00 00 00  38 6a f8 57 55 55 00 00
0120:   38 31 98 57 55 55 00 00  00 00 00 00 00 00 00 00  08 50 85 f3 ff 7f 00 00  00 00 00 00 00 00 00 00
0140:   00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  18 32 98 57 55 55 00 00
0160:   01 00 00 00 00 00 00 00  98 51 85 f3 ff 7f 00 00  18 32 98 57 55 55 00 00  02 00 00 00 00 00 00 00
0180:   38 31 98 57 55 55 00 00  38 31 98 57 55 55 00 00  98 32 98 57 55 55 00 00  b0 51 85 f3 ff 7f 00 00
01a0:   00 e0 85 f3 ff 7f 00 00  18 32 98 57 55 55 00 00  01 00 00 00 00 00 00 00  58 c1 85 f3 ff 7f 00 00
01c0:   98 32 98 57 55 55 00 00  78 51 85 f3 ff 7f 00 00  00 b4 aa 55 55 55 00 00
 TRACE alpha::gc                >   to  @0x7ffff3854000: Length: 472 (0x1d8) bytes
0000:   18 32 98 57 55 55 00 00  00 00 00 00 00 00 00 00  d8 30 98 57 55 55 00 00  08 00 00 00 00 00 00 00
0020:   a8 68 f8 57 55 55 00 00  18 60 86 f3 ff 7f 00 00  00 00 00 00 00 00 00 00  08 40 85 f3 ff 7f 00 00
0040:   00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00
0060:   d8 30 98 57 55 55 00 00  ff ff ff ff ff ff ff ff  78 6a c4 57 55 55 00 00  38 31 98 57 55 55 00 00
0080:   01 7c ff ff ff 7f 00 04  08 40 85 f3 ff 7f 00 00  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00
00a0:   00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  10 41 85 f3 ff 7f 00 00  d8 30 98 57 55 55 00 00
00c0:   08 00 00 00 00 00 00 00  28 73 f8 57 55 55 00 00  18 40 86 f3 ff 7f 00 00  00 00 00 00 00 00 00 00
00e0:   08 40 85 f3 ff 7f 00 00  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00
0100:   00 00 00 00 00 00 00 00  d8 30 98 57 55 55 00 00  00 00 00 00 00 00 00 00  38 6a f8 57 55 55 00 00
0120:   38 31 98 57 55 55 00 00  00 00 00 00 00 00 00 00  08 40 85 f3 ff 7f 00 00  00 00 00 00 00 00 00 00
0140:   00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  18 32 98 57 55 55 00 00
0160:   01 00 00 00 00 00 00 00  90 41 85 f3 ff 7f 00 00  98 32 98 57 55 55 00 00  a8 41 85 f3 ff 7f 00 00
0180:   00 b4 aa 55 55 55 00 00  98 32 98 57 55 55 00 00  c8 41 85 f3 ff 7f 00 00  00 e0 85 f3 ff 7f 00 00
01a0:   18 32 98 57 55 55 00 00  02 00 00 00 00 00 00 00  38 31 98 57 55 55 00 00  38 31 98 57 55 55 00 00
01c0:   18 32 98 57 55 55 00 00  01 00 00 00 00 00 00 00  58 c1 85 f3 ff 7f 00 00

It’s hard to read these, but it’s better than nothing.

If you use Rust, backtrace and pretty-hex are some useful crates here. tracing was also helpful to get more structured logs that show call nesting. (It’s funny because it basically maintains a shadow stack of its own.)

Run GC as often as possible

Instead of running GC when a threshold is reached, run in before every allocation. This way, you’ll quickly catch what variables are not properly rooted. (That’s a performance kill, but that’s for debugging only.)

Use page protection to catch bugs even sooner

If you develop a compacting/copying garbage collector, it’s helpful to use page protection.

After you move values out of a page, you can use mprotect to set permissions to none—even reading from a page is forbidden. This might seem useless—who the fuck needs a memory that they cannot read?—but this helps catching errors earlier. Instead of reading values from invalid memory, processing them, and crashing somewhere down the line, your app will crash as soon as it tries to access the old memory. If you see SIGSEGV: address access protected, you know some pointer didn’t get updated by the GC.

I use region for this.

Write tests

It might seem complicated to write tests for a garbage collector, but that is still possible. You don’t have to be clever to cover all cases—simply allocating different types, and testing invariants was good enough for me.

This is the test that caught the last bug I’ve been tracking down for days:

#[test]
#[serial]
fn type_t() {
    crate::types::init();
    unsafe {
        let ty = DataType::new(symbol("TestDataType"), ANY_T.load(), 0, &[]);
        gc_box!(ty);
        let t = Type::new(ty.load());
        gc_box!(t);

        collect_garbage();

        assert_eq!((*t.load()).t, ty.load());
    }
}

Take breaks

I’ve implemented many memory allocators before, but copying garbage collector is much more challenging—the memory is constantly moving, pointers change, a typical trace is 50k+ lines long, and it’s hard to grasp what’s going on. It’s very exhausting, and tracking down all bugs took 4 days.

Taking breaks is a rather universal tip when you’re stuck in debugging for long, but I am still surprised how well it works—most of the progress was made right after a night’s sleep or a good long break.

Current status and next steps

Garbage collection is almost finished: the collector works, global variables are rooted, and Rust code maintains the shadow stack (I hope). The last bit is modifying the compiler, so generated Alpha code maintains the shadow stack, too.

After that is done, it will be an excellent opportunity to refocus on code quality. The previous weeks were rather expansive—I’ve added new features fast, the code quality has suffered and needs a cleanup. After GC is done, Alpha won’t be limited by a fixed-size memory, so I could write more tests and refactor aggressively.

It might also be a good time to provide some overview of the project structure.

Backlinks