Recently, while in Manchester, I designed and implemented the mgc - manchester garbage collector #10 for purple-garden. This article is a deep dive into its inner workings and why it is designed in the way it is.
Intro
While garbage collection in general is a widely researched subject, the combination of multiple garbage collection techniques can yield improved results compared to relying on a single strategy. The manchester garbage collector (mgc) is such a composite of several paradigms: allocation into preallocated memory regions, reachability analysis via recursive root set tracing and compacting semi-space copying. At the same time this is not novel, but specifically engineered for this runtime.
Combining these approaches enables fast allocation, low allocation latency and reduced fragmentation.
Purple-garden
mgc is specifically engineered for the purple-garden runtime. Purple-garden
is a minimalist scripting language, designed and implemented with a focus on
performance and a low memory profile, see:
xnacly/purple-garden.
1fn greeting :: greetee { std::println("hello world to:" greetee) }
2greeting(std::env::get("USER")) # hello world to: $USER
3
4var v = "Hello World"
5std::println(std::runtime::type(v) std::len(v)) # str 11Purple-garden is focussed on embeddablity and ease of use as a scripting language. For this, it has a small memory footprint, a high performance runtime and a minimalist, yet useful standard library. It is implemented in C as a register-based bytecode compiler and virtual machine.
While the above example doesn’t allocate memory (in the runtime itself) and therefore isn’t a great example for this article, it is still a great example what purple garden is about.
A better example for some runtime heap pressure is the following recursive string concatenation:
1fn f :: n {
2 match {
3 n < 100000 {
4 f(n+1)
5 var s1 = std::str::append("a" "b")
6 var s2 = std::str::append("c" "d")
7 var pair = std::str::append(s1 s2)
8 }
9 { std::println(n) }
10 }
11}
12f(0)
13std::runtime::gc::cycle()Since f does not allocate enough memory to trigger the garbage collector, see
Triggering Garbage Collection, we manually
trigger a collection via std::runtime::gc::cycle(). Also, purple garden uses
a separate allocator for call frames to reduce gc pressure.
Garbage collection stages
The mgc is divided in three stages, each requiring the runtime to stop for the duration of the collection. All stages together form a collection cycle, see below:
1+-------------+
2| Roots Set |
3| (VM regs, |
4| globals...) |
5+------+------+
6 |
7 v
8+-------------+
9| Mark Phase |
10| Mark all |
11| live objects|
12+------+------+
13 |
14 v Copy
15+-------------------+ live +-------------------+
16| Old Bump Space | objects | New Bump Space |
17| (old allocator) | -------> | (new allocator) |
18+-------------------+ +-------------------+
19 |
20 v
21+--------------+
22| Reset Old |
23| Bump Alloc |
24| (len=0,pos=0)|
25+------+-------+
26 |
27 v
28+-------------+
29| Swap Alloc |
30| old <-> new |
31+-------------+Mark
To manage allocations in said virtual machine, the garbage collector defines a set of starting points which it uses to walk all reachable allocations and marking them as reachable. This set is called a root set. In purple-garden (pg), this root set consists of the current registers and the variable table holding local variable bindings. Since variables in higher scopes are considered alive, even while they aren’t in scope, the garbage collector has to take parent scopes into account. This is implemented by walking the root set recursively, setting the mark bit on all reachable objects.
For instance in the following example, if the gc runs after z(): a and b
are alive, while c isn’t anymore and can be cleaned up safely:
1# alive
2var a = allocating()
3fn z {
4 # dead
5 var c = allocating()
6}
7# alive
8var b = allocating()
9z()Copy
Once the alive subset A of currently allocated set M is determined, said
values in A can be safely copied from the previously used memory region
(from-space) to the new space (to-space). Both spaces are backed by bump
allocator which are implemented as a segmented list of memory regions allocated
through mmap.
After copying the from-space is reset, but not deallocated. Then from-space and to-space are swapped, so from-space is now the to-space, and vice versa.
Rewriting references
Since the copy and move to new memory locations invalidates previously held references, rewriting said references to point to the newly copied memory regions is necessary.
For this, the previously established root set needs to be walked again and all references updated.
Reference implementation
Putting the above concepts into reality and C, is the content of the next chapters.
1typedef struct {
2 // from-space
3 Allocator *old;
4 // to-space
5 Allocator *new;
6 void *vm;
7 GcHeader *head;
8 size_t allocated;
9 size_t allocated_since_last_cycle;
10} Gc;
11
12Gc gc_init(size_t gc_size) {
13 size_t half = gc_size / 2;
14 return (Gc){
15 .old = bump_init(half, 0),
16 .new = bump_init(half, 0),
17 .head = NULL,
18 };
19}Allocator is a simple struct abstracting allocation away, for instance for a bump allocator:
1typedef struct {
2 // Allocator::ctx refers to an internal allocator state and owned memory
3 // areas, for instance, a bump allocator would attach its meta data (current
4 // position, cap, etc) here
5 void *ctx;
6 Stats (*stats)(void *ctx);
7 void *(*request)(void *ctx, size_t size);
8 void (*reset)(void *ctx);
9 void (*destroy)(void *ctx);
10} Allocator;Keeping track of allocations
For the garbage collector to know which regions, of what size and which type it handed out, each region is allocated with a metadata header. Said header consists of:
1typedef struct GcHeader {
2 unsigned int marked : 1;
3 unsigned int type : 3;
4 uintptr_t forward;
5 uint16_t size;
6 struct GcHeader *next;
7} GcHeader;The 3 type bits identify the payload as either raw bytes, a string, a list or a map, see below. The header also holds a forwarding pointer for reference rewriting, the size of the corresponding payload (Object sizes are capped at 64KB; this matches current runtime needs) and a pointer to the next allocation. This enables a heap scan in the sense of iterating a linked list of headers, helping in the rewriting process.
1typedef enum {
2 // just bytes
3 GC_OBJ_RAW = 0b000,
4 // a string with a reference to an inner string, can be allocated or not
5 GC_OBJ_STR = 0b001,
6 // list has zero or more children
7 GC_OBJ_LIST = 0b010,
8 // map holds allocated buckets with owned children
9 GC_OBJ_MAP = 0b011,
10} ObjType;So an allocation looks like the following:
1 allocation
2 (raw pointer)
3 |
4 |
5 v
6 +----------------------+
7 | GcHeader | <-- header
8 +----------------------+
9 | |
10 | payload (size B) | <-- data handed out as ptr to the user
11 | |
12 +----------------------+Each GcHeader is 32B, thus each heap allocation has an added overhead of 32B.
1void *gc_request(Gc *gc, size_t size, ObjType t) {
2 void *allocation = gc->old->request(gc->old->ctx, size + sizeof(GcHeader));
3 void *payload = (char *)allocation + sizeof(GcHeader);
4 GcHeader *h = (GcHeader *)allocation;
5 h->type = t;
6 h->marked = 0;
7 h->size = size;
8 h->next = gc->head;
9 gc->head = h;
10 gc->allocated_since_last_cycle += size;
11 gc->allocated += size;
12 return (void *)payload;
13}As explained before, after marking all available objects, these alive objects
are copied. This process also involves walking the chained list of headers, but
is restricted to those with GcHeader.marked toggled. Each object is copied into
to-space and the GcHeader.forward field is set to the new memory region.
(Almost) Zero cost abstractions and small Values
Values in the purple garden runtime use three bits for encoding their type:
1typedef enum {
2 V_NONE, // zero value, comparable to rusts Option::None
3 V_STR, // string view
4 V_DOUBLE, // floating point number
5 V_INT, // integer
6
7 V_TRUE, // booleans
8 V_FALSE,
9
10 V_ARRAY, // containers / adts
11 V_OBJ,
12} ValueType;A union for storing the data for each type (excluding true and false, since
these do not require further data). Str for the custom string abstraction,
List for the dynamically growing array, Map for the hash map, a double and an
int64_t.
1typedef struct Value {
2 unsigned int type : 3;
3 union {
4 Str string;
5 List *array;
6 Map *obj;
7 double floating;
8 int64_t integer;
9 };
10} Value;And a singular bit for marking a value as optional: is_some, making each type
optionally representable:
1typedef struct Value {
2 unsigned int type : 3;
3 unsigned int is_some : 1;
4 union {
5 Str string;
6 List *array;
7 Map *obj;
8 double floating;
9 int64_t integer;
10 };
11} Value;Using a bit tag on Value itself results in the valid states of all types
except V_NONE combined with is_some equal to true and false. While the type
V_NONE can only be combined with is_some being false. This invariant is
checked in the runtime and there is a specific function for determining whether an
instance of Value is optionally available:
1inline bool Value_is_opt(const Value *v) {
2 return v->type == V_NONE || v->is_some;
3}The big advantage of this approach is enabling a zero allocation, zero copy and
zero indirection optional implementation in the runtime, since each and all
values can be turned into an optional value by setting the Value.is_some bit.
If optionality were implemented via V_OPTION and with Value.inner being the
value while Value.is_some would indicate if said option were holding
something, this would always require an allocation, due to the recursive nature
of this implementation.
1# optionals
2var opt_none = std::None()
3var opt_some = std::Some([])
4std::opt::or(opt_none "anything else") # -> "anything else"
5std::opt::unwrap(opt_some) # -> []
6std::opt::expect(opt_some "would panic with this message") # -> []
7std::opt::is_some(opt_some) # -> true
8std::opt::is_none(opt_none) # -> true Standard library functions interacting with optionals are in std::opt and
many functions in other packages are returning or accepting optional values.
Functions in this package are trivial to implement with this value design, for
instance std::opt::{some, none, or}:
1static void pg_builtin_opt_some(Vm *vm) {
2 Value inner = ARG(0);
3 inner.is_some = true;
4 RETURN(inner);
5}
6
7// INTERNED_NONE is a static Value instance thats always available to the runtime
8
9static void pg_builtin_opt_none(Vm *vm) { RETURN(*INTERNED_NONE); }
10
11static void pg_builtin_opt_or(Vm *vm) {
12 Value lhs = ARG(0);
13 Value rhs = ARG(1);
14 ASSERT(Value_is_opt(&lhs), "Or: lhs wasnt an Optional");
15 if (!lhs.is_some) {
16 RETURN(rhs);
17 } else {
18 lhs.is_some = false;
19 RETURN(lhs);
20 }
21}The design also allows for numeric Value instances without allocations, trading smaller Value sizes for less heap allocations in hot paths of the runtime.
Differentiating heap and non-heap values
For purple garden, neither booleans, doubles or integers require heap
allocation. Strings which are known at compile time are simply windows into the
initial input the interpreter gets, therefore not heap allocated via gc_alloc
and not subject to marking or collection.
Thus a marker for distinguishing between heap values and non heap values is
necessary: Value.is_heap. Only values with this bit toggled are considered
for marking, collection and passed to both the copy and rewrite phases of the
garbage collector.
1typedef struct Value {
2 unsigned int is_heap : 1;
3 unsigned int type : 3;
4 union {
5 Str string;
6 List *array;
7 Map *obj;
8 double floating;
9 int64_t integer;
10 };
11} Value;Dynamically growable bump allocation
Info - Growable bump allocators
For an in depth commentary on bump allocation, growable chunks, mmap and segmented lists build on top of these concepts, see the C part of my recent article: Porting a Segmented List From C to Rust: C Implementation.Since allocations are expensive due to them requiring a system interaction via syscalls, purple garden is designed to reduce syscalls as much as possible. To achieve this, a growing bump allocator is used. Upon getting a request the bump allocator cant satisfy, the allocator itself requests a new chunk of memory from the operating system. This chunk is double the size of the previous one, while the first chunk is as large as the page size of the system it is running on.
String abstraction
Since c style strings suck due to them not having a length associated at all
times, purple garden has a string abstraction built-in. It iterates on the
computationally expensive c string interactions by making hashing and length
first class citizen. Str is a view into a buffer it doesn’t manage, it holds
a pointer to a buffer, a length and a hash. Its backing storage is immutable.
Str is cheap to copy and therefore kept by value, not by reference in Value.
1typedef struct __Str {
2 uint64_t hash;
3 uint32_t len;
4 const uint8_t *p;
5} Str;Str.length is computed at creation time. A static string that’s know when
compiling the runtime (things like error messages or string representations of
value type names), is passed to the STRING macro:
1#define STRING(str) \
2 ((Str){.len = sizeof(str) - 1, .p = (const uint8_t *)str, .hash = 0})For strings included in the input to the runtime, the runtime executes. A different approach is used, for instance, consider something like:
1std::println("Hello" "World")In this example both sizes and hash of "Hello" and "World" are known as
soon as the lexer determines them to be string tokens. Abusing this fact and
the fact, that all characters of a string have to be walked, both the length
and the hash are computed while the lexer recognises strings:
1// ! error handling omitted for clarity
2string: {
3 // skip "
4 l->pos++;
5 size_t start = l->pos;
6 uint64_t hash = FNV_OFFSET_BASIS;
7
8 char cc = l->input.p[l->pos];
9 for (; cc > 0 && cc != '"'; l->pos++, cc = l->input.p[l->pos]) {
10 hash ^= cc;
11 hash *= FNV_PRIME;
12 }
13
14 Token *t = CALL(a, request, sizeof(Token));
15 *t = (Token){0};
16 t->type = T_STRING;
17 t->string = (Str){
18 .p = l->input.p + start,
19 .len = l->pos - start,
20 .hash = hash,
21 };
22
23 l->pos++;
24 return t;
25}Hashing is done via
FNV-1a.
Computing hashes of strings created at runtime, a complementary Str_hash
function is provided:
1inline uint64_t Str_hash(const Str *str) {
2 uint64_t hash = FNV_OFFSET_BASIS;
3 for (size_t i = 0; i < str->len; i++) {
4 hash ^= (uint64_t)str->p[i];
5 hash *= FNV_PRIME;
6 }
7 return hash;
8}There is a small internal API built on top of this abstraction:
1char Str_get(const Str *str, size_t index);
2Str Str_from(const char *s);
3Str Str_slice(const Str *str, size_t start, size_t end);
4Str Str_concat(const Str *a, const Str *b, Allocator *alloc);
5bool Str_eq(const Str *a, const Str *b);
6void Str_debug(const Str *str);
7int64_t Str_to_int64_t(const Str *str);
8double Str_to_double(const Str *str);While the str standard library package exposes str::append, str::lines
and str::slice:
1std::str::append("Hello" " World") # Hello World
2std::str::slice("Hello" 0 2) # hel
3std::str::slice("Hello" 1 2) # el
4std::str::lines("hello\nworld") # [hello world]Triggering garbage collection
Since allocation latency is more important for short live scripts, triggering and checking for the memory usage to surpass the garbage collection threshold on allocation or worse, on each bytecode instruction can slow the virtual machine down substantially and thus introduce latency in both execution and allocation speeds.
To circumvent this, the threshold check is only performed after leaving a scope, this has the benefit of omitting all objects one would have to consider as alive in the scope before the vm left said scope. Triggering the gc cycle by piggy backing on control flow and memory usage makes the gc behaviour easier to reason about.
1 case OP_RET: {
2 Frame *old = vm->frame;
3 if (vm->frame->prev) {
4 vm->pc = vm->frame->return_to_bytecode;
5 vm->frame = vm->frame->prev;
6 }
7
8 // [...] free lists for stack frames and stuff
9
10 if (!vm->config.disable_gc &&
11 vm->gc->allocated >= (vm->config.gc_size * vm->config.gc_limit)) {
12 gc_cycle(vm->gc);
13 }
14 break;
15 }The runtime standard library exposes a gc package for both
std::runtime::gc::stats and triggering a garbage collection cycle manually:
std::runtime::gc::cycle(), as shown in the initial example at the start of
this article. The implementation and usage for the latter is shown below.
1 if (!vm->config.disable_gc) {
2 gc_cycle(vm->gc);
3 }
4}1var combination = std::str::append("Hello" " " "World")
2std::println(combination)
3std::runtime::gc::cycle()Marking a Value
As previously explained, only values with is_heap should be considered roots
and to be managed by the gc. Thus mark has an early exit condition:
1static inline void mark(Gc *gc, const Value *val) {
2 if (!val || !val->is_heap) {
3 return;
4 }
5
6 // [...]
7}Since Str wraps the underlying gc managed buffer (GC_OBJ_RAW) while not
being allocated itself, this has to be handled differently from V_ARRAY and
V_OBJ:
1static inline void mark(Gc *gc, const Value *val) {
2 // [...]
3
4 void *payload = NULL;
5 switch ((ValueType)val->type) {
6 case V_STR:
7 GcHeader *raw = (GcHeader *)((uint8_t *)val->string.p - sizeof(GcHeader));
8 raw->marked = true;
9 break;
10 return;
11 case V_ARRAY:
12 payload = (void *)val->array;
13 break;
14 case V_OBJ:
15 payload = (void *)val->obj;
16 break;
17 default:
18 return;
19 }
20
21 // [...]
22}The payload of arrays and objects is allocated, thus they are casted and
assigned. If the resulting payload is less than sizeof(GcHeader) either a
heap corruption or a gc bug occured, thus we assert. Then an exit condition for
already marked headers is created.
1static inline void mark(Gc *gc, const Value *val) {
2 // [...]
3
4 ASSERT((uintptr_t)payload > sizeof(GcHeader),
5 "payload too small, GC logic bug, this shouldnt happen");
6
7 GcHeader *h = (GcHeader *)((char *)payload - sizeof(GcHeader));
8 if (!h || h->marked) {
9 return;
10 }
11
12 h->marked = 1;
13
14 // [...]
Since we already handled a string, we can now switch on the GcHeader->type
bits and recursively mark each member of an array and each value of an object.
1static inline void mark(Gc *gc, const Value *val) {
2 // [...]
3
4 switch ((ObjType)h->type) {
5 case GC_OBJ_LIST: {
6 List *l = (List *)payload;
7 for (size_t i = 0; i < l->len; i++) {
8 mark(gc, &l->arr[i]);
9 }
10 break;
11 }
12 case GC_OBJ_MAP:
13 Map *m = (Map *)payload;
14 for (size_t i = 0; i < m->cap; i++) {
15 MapEntry e = m->buckets[i];
16 mark(gc, &e.value);
17 }
18 default:
19 return;
20 }
21}Letting mark loose on registers and call frames
Since we need to mark our root set in gc_cycle and our root set consists of
both the registers, the variables in the current and all previous call frames,
we need to call mark on each:
1void gc_cycle(Gc *gc) {
2 if (!gc->allocated_since_last_cycle) {
3 return;
4 }
5 gc->allocated_since_last_cycle = 0;
6 Vm *vm = ((Vm *)gc->vm);
7 for (size_t i = 0; i < REGISTERS; i++) {
8 const Value *ri = vm->registers + i;
9 mark(gc, ri);
10 }
11
12 for (Frame *f = vm->frame; f; f = f->prev) {
13 for (size_t i = 0; i < f->variable_table.cap; i++) {
14 MapEntry *me = &f->variable_table.buckets[i];
15 if (me->hash) {
16 mark(gc, &me->value);
17 }
18 }
19 }
20
21 // [...]
22}Copying marked GcHeader
After marking each Value and GcHeader referenced by the root set, we copy
those to the to-space / new allocator. We also rebuild the GcHeader chain
of copied headers.
1void gc_cycle(Gc *gc) {
2 // [...]
3
4 GcHeader *new_head = NULL;
5 size_t new_alloc = 0;
6 for (GcHeader *h = gc->head; h; h = h->next) {
7 if (!h->marked) {
8 continue;
9 }
10
11 void *buf = CALL(gc->new, request, h->size + sizeof(GcHeader));
12 GcHeader *nh = (GcHeader *)buf;
13 void *new_payload = (char *)buf + sizeof(GcHeader);
14 memcpy(nh, h, sizeof(GcHeader) + h->size);
15 nh->next = new_head;
16 new_head = nh;
17 h->forward = (uintptr_t)new_payload;
18 nh->forward = 0;
19 nh->marked = 0;
20 new_alloc += h->size;
21 }
22
23 // [...]
24}Forwarding pointers and Rewriting references
markandrewrite_nestedare subject to blowing the stack up, both will use a work list in the future
Each GcHeader.forward is then used to update all references to the newly
copied regions:
1static inline void *forward_ptr(void *payload) {
2 if (!payload) {
3 return NULL;
4 }
5
6 GcHeader *old = (GcHeader *)((char *)payload - sizeof(GcHeader));
7 if (!old) {
8 // not a gc object, hitting this is a bug
9 unreachable();
10 }
11
12 if (old->type < GC_OBJ_RAW || old->type > GC_OBJ_MAP) {
13 // either already in newspace or not a heap object; return payload unchanged
14 return payload;
15 }
16
17 if (old->forward) {
18 return (void *)old->forward;
19 }
20
21 // normally this would be unreachable, but since pg doesnt clear registers
22 // after insertions into adts or the variable table, references to heap data
23 // can be both in the variable table and registers at the same time, thus
24 // allowing for multiple forward_ptr calls since there are multiple references
25 // to a single point in memory. This results in double forwarding and other
26 // shenanigans. Just returning the payload if no forward was found is correct
27 // and a fix.
28 return payload;
29}rewrite is really as simple as it gets:
1static inline void rewrite(Gc *gc, Value *v) {
2 if (!v->is_heap) {
3 return;
4 }
5
6 switch ((ValueType)v->type) {
7 case V_STR:
8 v->string.p = (const uint8_t *)forward_ptr((void *)v->string.p);
9 break;
10 case V_ARRAY:
11 v->array = (List *)forward_ptr((void *)v->array);
12 break;
13 case V_OBJ:
14 v->obj = (Map *)forward_ptr((void *)v->obj);
15 break;
16 default:
17 return;
18 }
19}rewrite_nested iterates on rewrite by nesting the process for objects and
arrays:
1void rewrite_nested(Gc *gc, Value *v) {
2 rewrite(gc, v);
3
4 switch (v->type) {
5 case V_ARRAY:
6 for (size_t i = 0; i < v->array->len; i++) {
7 rewrite_nested(gc, &v->array->arr[i]);
8 }
9 break;
10 case V_OBJ:
11 for (size_t i = 0; i < v->obj->cap; i++) {
12 MapEntry *me = &v->obj->buckets[i];
13 if (me->hash) {
14 rewrite_nested(gc, &me->value);
15 }
16 }
17 break;
18 default:
19 break;
20 }
21}Put together in gc_cycle, starting with the registers, each entry in each
variable table of each frame and each header in the GcHeader chain, all
Values are rewritten.
1void gc_cycle(Gc *gc) {
2 // [...]
3
4 for (size_t i = 0; i < REGISTERS; i++) {
5 Value *ri = &vm->registers[i];
6 rewrite(gc, ri);
7 }
8
9 for (Frame *f = vm->frame; f; f = f->prev) {
10 for (size_t i = 0; i < f->variable_table.cap; i++) {
11 MapEntry *me = &f->variable_table.buckets[i];
12 if (me->hash) {
13 rewrite_nested(gc, &me->value);
14 }
15 }
16 }
17
18 for (GcHeader *h = new_head; h; h = h->next) {
19 switch (h->type) {
20 case GC_OBJ_LIST: {
21 List *l = (List *)((uint8_t *)h + sizeof(GcHeader));
22 for (size_t i = 0; i < l->len; i++) {
23 rewrite_nested(gc, &l->arr[i]);
24 }
25 break;
26 }
27 case GC_OBJ_MAP: {
28 Map *m = (Map *)((uint8_t *)h + sizeof(GcHeader));
29 for (size_t i = 0; i < m->cap; i++) {
30 MapEntry *me = &m->buckets[i];
31 if (me->hash) {
32 rewrite_nested(gc, &me->value);
33 }
34 }
35 break;
36 }
37 case GC_OBJ_STR: {
38 Str *str = (Str *)((uint8_t *)h + sizeof(GcHeader));
39 str->p = forward_ptr((void *)str->p);
40 break;
41 }
42 default:
43 break;
44 }
45 }
46
47 // [...]
48}Bookkeeping after collection
1void gc_cycle(Gc *gc) {
2 // [...]
3 gc->head = new_head;
4 SWAP_STRUCT(gc->old, gc->new);
5 CALL(gc->new, reset);
6 gc->allocated = new_alloc;
7}After mark, copy and rewrite, the chain is attached to Gc.head,
The from and to-space swap places via SWAP_STRUCT:
1#define SWAP_STRUCT(A, B) \
2 do { \
3 _Static_assert(__builtin_types_compatible_p(typeof(A), typeof(B)), \
4 "SWAP_STRUCT arguments must have identical types"); \
5 \
6 typeof(A) __swap_tmp = (A); \
7 (A) = (B); \
8 (B) = __swap_tmp; \
9 } while (0)The new (new, since the previous old allocator is now the new allocator)
allocator is reset with CALL, which is a macro:
1#define CALL(SELF, METHOD, ...) (SELF)->METHOD((SELF)->ctx, ##__VA_ARGS__)Expanding to (gc->new)->reset((gc->new)->ctx);.
Tuning
As the virtual machine is configurable via the VmConfig structure, so is the
garbage collector:
1typedef struct {
2 // defines the maximum amount of memory purple garden is allowed to allocate,
3 // if this is hit, the vm exits with a non zero code
4 uint64_t max_memory;
5 // define gc heap size in bytes
6 uint64_t gc_size;
7 // gc threshold in percent 5-99%
8 double gc_limit;
9 // disables the garbage collector
10 bool disable_gc;
11
12 // [...]
13} Vm_Config;This configuration is passed to Vm_new and attached to the Vm structure. It
allows for fully disabling the garbage collector, setting a total size for the
gc heap, and a threshold of the gc heap size in percent at which the gc should
start a cycle.
For non embedding purposes, specifically for simply running the interpreter, each option, its short name, its default value, its type and its description is defined in the list of cli flags the purple garden binary accepts:
1#define CLI_ARGS \
2 X(gc_max, 0, GC_MIN_HEAP * 64l, LONG, \
3 "set hard max gc space in bytes, default is GC_MIN_HEAP*64") \
4 X(gc_size, 0, GC_MIN_HEAP * 2l, LONG, "define gc heap size in bytes") \
5 X(gc_limit, 0, 70.0, DOUBLE, \
6 "instruct memory usage amount for gc to start collecting, in percent " \
7 "(5-99%)") \
8 X(no_gc, 0, false, BOOL, "disable garbage collection") Producing a nice help text via a modified 6cl library (omitted all other
options for clarity):
1usage ./build/purple_garden_debug: [ +gc_max <1638400>] [ +gc_size <51200>]
2 [ +gc_limit <70>] [ +no_gc]
3 // [...]
4 <file.garden>
5
6Option:
7 +gc_max <1638400>
8 set hard max gc space in bytes, default is GC_MIN_HEAP*64
9 +gc_size <51200>
10 define gc heap size in bytes
11 +gc_limit <70>
12 instruct memory usage amount for gc to start collecting, in
13 percent (5-99%)
14 +no_gc
15 disable garbage collection
16 // [...]
17
18 +h/+help
19 help page and usage
20Examples:
21 ./build/purple_garden_debug +gc_max 1638400 +gc_size 51200 \
22 +gc_limit 0 +no_gcIf the user sets these, the main purple garden entry point uses these cli
options to set the VmConfig:
1Vm vm = Vm_new(
2 (Vm_Config){
3 .gc_size = a.gc_size,
4 .gc_limit = a.gc_limit,
5 .disable_gc = a.no_gc,
6 .max_memory = a.gc_max,
7 .disable_std = a.no_std,
8 .no_env = a.no_env,
9 }, pipeline_allocator, &gc);Comparing mgc to other gc idioms and other reasoning
The obvious question at this point is of purpose and complexity. Why design a garbage collector leveraging multiple paradigms when seemingly each paradigm on its own could suffice?
Why Bump allocation instead of malloc
First and foremost, keeping allocations fast is crucial for this kind of runtime. For this only bump allocation is an option, simply because syscalls are slow, multiple for many small objects are even slower and hot paths would explode. Due to purple garden being optimised for allocation and not deallocation, an alternative to bump allocation is out of the question.
Why Copying, Compacting and resetting whole Heap regions instead of single value dealloc
Since this restricts the collection of objects due to bump allocation disallowing the deallocation of single objects, a copy to-space, from-space collection approach is the most evident. Copying also has the benefit of bulk reclamation and reduced fragmentation, especially compared to single-object deallocation.
Why Register and variable tracing vs heap scanning
The root set is absolute and known at all times. Knowing these values determining reachability is the most efficient implementation, especially compared to walking the whole gc heap. Why waste cycles for scanning the heap if all root values are known at all times and headers are chained?
Why not X
On the other side of the initial question is a list of concepts some garbage collectors of famous runtimes employ. These were purposely omitted, both for keeping the garbage collector safe from feature creep and due to the runtime being single thread, the gc being stop the world and in general me disallowing concurrent mutations of any state, especially the gc state and root set:
- write barriers/incremental: Only necessary for concurrent collectors, but purple garden is single-threaded and stop-the-world. All objects are known at collection time.
- generational: optimised for long-lived objects, but purple-garden is an embeddable scripting language, it is designed and engineered for being cheap and quick to spin up for something like a plugin programming behaviour running on each event or entity in a game. It is specifically targeted for short lived execution and thus allocations.
- RC/ARC: large runtime overhead and cycles need extra non trivial detection logic, which would slow hot paths down
- escape analysis: optimises for reducing heap allocation, but mgc makes allocations extra cheap, plus purple gardens bytecode is compiled from its abstract syntax tree into a compact format, there is no immediate representation with enough lifetime info for performing escape analysis
Gc invariants and assumptions
This is a short list included for completeness, violating one of these invariants results in undefined behavior.
- depends on registers not being cleared
- values not being duplicated incorrectly
Value.type=V_STRandValue.is_heap=1to mean the string is heap allocated, else its ‘allocated’ by the lexer (technically by mmaping the input into process memory, but you get the gist)- vm discipline around setting
Value.is_heap
(non-)Portability to Rust and Design Improvements
I’m currently in the process of rewriting purple garden in Rust for a plethora of reasons (these are not that easy to understand if one hasn’t worked on a language runtime before):
less abstractions, since I had to hand roll hashing, hash maps, arrays, array lists, which I can just replace with
std::collectionbetter and shorter bytecode compiler, since rust just allows me to
Vec<Op>with better append and inserts than in my implementationsbetter bytecode format and less wasted bytes for each instruction, since this allows me to have things like a single byte for
RETand multiple bytes for things likeADD rA, rBvia:RUST1enum Op<'vm> { 2 Add { 3 dst: u8, 4 lhs: u8, 5 rhs: u8, 6 }, 7 // [...] 8 Ret { 9 /// used for peephole optimisation, merging multiple RET into a single 10 /// RET with a count 11 times: u8, 12 }, 13}way better error handling in each component of the runtime, since the current C interpreter just aborts via assertions
easier compile time value interning via rust hash maps (I had three hash maps for three different types), now I just use
std::collections::HashMapthe
ARGSinstruction for encoding both register offset and count for arguments to builtin and user defined function calls is no longer necessary, since bothSYSandCALLencode their offset and argument count in their instruction:RUST1enum Op<'vm> { 2 // [...] 3 Call { 4 func: u16, 5 args_start: u8, 6 args_len: u8, 7 }, 8 Sys { 9 ptr: BuiltinFn<'vm>, 10 args_start: u8, 11 args_len: u8, 12 }, 13}Builtin functions for
SYSno longer require type erasure, casting, indirection in the compiler, since I wasn’t able to store a 64bit pointer in a word, so the compiler created an array of pointers to a builtin and the bytecode encoded an index into said array the vm could use to call the correct builtin, this is now just a fn pointer, see above for the reference in the bytecode and below for the definition:RUST1type BuiltinFn<'vm> = fn(&mut Vm<'vm>, &[Value]);