The Manchester Garbage Collector and purple-garden's runtime

Tags:

Recently, while in Manchester, I designed and implemented the mgc - manchester garbage collector #10 for purple-garden. This article is a deep dive into its inner workings and why it is designed in the way it is.

Intro

While garbage collection in general is a widely researched subject, the combination of multiple garbage collection techniques can yield improved results compared to relying on a single strategy. The manchester garbage collector (mgc) is such a composite of several paradigms: allocation into preallocated memory regions, reachability analysis via recursive root set tracing and compacting semi-space copying. At the same time this is not novel, but specifically engineered for this runtime.

Combining these approaches enables fast allocation, low allocation latency and reduced fragmentation.

Purple-garden

mgc is specifically engineered for the purple-garden runtime. Purple-garden is a minimalist scripting language, designed and implemented with a focus on performance and a low memory profile, see: xnacly/purple-garden.

NIX
1fn greeting :: greetee { std::println("hello world to:" greetee) }
2greeting(std::env::get("USER")) # hello world to: $USER
3
4var v = "Hello World"
5std::println(std::runtime::type(v) std::len(v)) # str 11

Purple-garden is focussed on embeddablity and ease of use as a scripting language. For this, it has a small memory footprint, a high performance runtime and a minimalist, yet useful standard library. It is implemented in C as a register-based bytecode compiler and virtual machine.

While the above example doesn’t allocate memory (in the runtime itself) and therefore isn’t a great example for this article, it is still a great example what purple garden is about.

A better example for some runtime heap pressure is the following recursive string concatenation:

NIX
 1fn f :: n {
 2    match {
 3        n < 100000 {
 4            f(n+1)
 5            var s1 = std::str::append("a" "b")
 6            var s2 = std::str::append("c" "d")
 7            var pair = std::str::append(s1 s2)
 8        }
 9        { std::println(n) }
10    }
11}
12f(0)
13std::runtime::gc::cycle()

Since f does not allocate enough memory to trigger the garbage collector, see Triggering Garbage Collection, we manually trigger a collection via std::runtime::gc::cycle(). Also, purple garden uses a separate allocator for call frames to reduce gc pressure.

Garbage collection stages

The mgc is divided in three stages, each requiring the runtime to stop for the duration of the collection. All stages together form a collection cycle, see below:

TEXT
 1+-------------+
 2|  Roots Set  |
 3| (VM regs,   |
 4| globals...) |
 5+------+------+
 6       |
 7       v
 8+-------------+
 9|  Mark Phase |
10| Mark all    |
11| live objects|
12+------+------+
13       |
14       v              Copy
15+-------------------+ live     +-------------------+
16|   Old Bump Space  | objects  |   New Bump Space  |
17|  (old allocator)  | -------> |  (new allocator)  |
18+-------------------+          +-------------------+
19       |
20       v
21+--------------+
22| Reset Old    |
23| Bump Alloc   |
24| (len=0,pos=0)|
25+------+-------+
26       |
27       v
28+-------------+
29| Swap Alloc  |
30| old <-> new |
31+-------------+

Mark

To manage allocations in said virtual machine, the garbage collector defines a set of starting points which it uses to walk all reachable allocations and marking them as reachable. This set is called a root set. In purple-garden (pg), this root set consists of the current registers and the variable table holding local variable bindings. Since variables in higher scopes are considered alive, even while they aren’t in scope, the garbage collector has to take parent scopes into account. This is implemented by walking the root set recursively, setting the mark bit on all reachable objects.

For instance in the following example, if the gc runs after z(): a and b are alive, while c isn’t anymore and can be cleaned up safely:

NIX
1# alive
2var a = allocating() 
3fn z { 
4    # dead
5    var c = allocating()
6}
7# alive
8var b = allocating() 
9z()

Copy

Once the alive subset A of currently allocated set M is determined, said values in A can be safely copied from the previously used memory region (from-space) to the new space (to-space). Both spaces are backed by bump allocator which are implemented as a segmented list of memory regions allocated through mmap.

After copying the from-space is reset, but not deallocated. Then from-space and to-space are swapped, so from-space is now the to-space, and vice versa.

Rewriting references

Since the copy and move to new memory locations invalidates previously held references, rewriting said references to point to the newly copied memory regions is necessary.

For this, the previously established root set needs to be walked again and all references updated.

Reference implementation

Putting the above concepts into reality and C, is the content of the next chapters.

C
 1typedef struct {
 2  // from-space
 3  Allocator *old;
 4  // to-space
 5  Allocator *new;
 6  void *vm;
 7  GcHeader *head;
 8  size_t allocated;
 9  size_t allocated_since_last_cycle;
10} Gc;
11
12Gc gc_init(size_t gc_size) {
13  size_t half = gc_size / 2;
14  return (Gc){
15      .old = bump_init(half, 0),
16      .new = bump_init(half, 0),
17      .head = NULL,
18  };
19}

Allocator is a simple struct abstracting allocation away, for instance for a bump allocator:

C
 1typedef struct {
 2  // Allocator::ctx refers to an internal allocator state and owned memory
 3  // areas, for instance, a bump allocator would attach its meta data (current
 4  // position, cap, etc) here
 5  void *ctx;
 6  Stats (*stats)(void *ctx);
 7  void *(*request)(void *ctx, size_t size);
 8  void (*reset)(void *ctx);
 9  void (*destroy)(void *ctx);
10} Allocator;

Keeping track of allocations

For the garbage collector to know which regions, of what size and which type it handed out, each region is allocated with a metadata header. Said header consists of:

C
1typedef struct GcHeader {
2  unsigned int marked : 1;
3  unsigned int type : 3;
4  uintptr_t forward;
5  uint16_t size;
6  struct GcHeader *next;
7} GcHeader;

The 3 type bits identify the payload as either raw bytes, a string, a list or a map, see below. The header also holds a forwarding pointer for reference rewriting, the size of the corresponding payload (Object sizes are capped at 64KB; this matches current runtime needs) and a pointer to the next allocation. This enables a heap scan in the sense of iterating a linked list of headers, helping in the rewriting process.

C
 1typedef enum {
 2  // just bytes
 3  GC_OBJ_RAW = 0b000,
 4  // a string with a reference to an inner string, can be allocated or not
 5  GC_OBJ_STR = 0b001,
 6  // list has zero or more children
 7  GC_OBJ_LIST = 0b010,
 8  // map holds allocated buckets with owned children
 9  GC_OBJ_MAP = 0b011,
10} ObjType;

So an allocation looks like the following:

TEXT
 1       allocation 
 2      (raw pointer)
 3            |
 4            |
 5            v
 6 +----------------------+
 7 | GcHeader             | <-- header
 8 +----------------------+
 9 |                      |
10 | payload (size B)     | <-- data handed out as ptr to the user
11 |                      |
12 +----------------------+

Each GcHeader is 32B, thus each heap allocation has an added overhead of 32B.

C
 1void *gc_request(Gc *gc, size_t size, ObjType t) {
 2  void *allocation = gc->old->request(gc->old->ctx, size + sizeof(GcHeader));
 3  void *payload = (char *)allocation + sizeof(GcHeader);
 4  GcHeader *h = (GcHeader *)allocation;
 5  h->type = t;
 6  h->marked = 0;
 7  h->size = size;
 8  h->next = gc->head;
 9  gc->head = h;
10  gc->allocated_since_last_cycle += size;
11  gc->allocated += size;
12  return (void *)payload;
13}

As explained before, after marking all available objects, these alive objects are copied. This process also involves walking the chained list of headers, but is restricted to those with GcHeader.marked toggled. Each object is copied into to-space and the GcHeader.forward field is set to the new memory region.

(Almost) Zero cost abstractions and small Values

Values in the purple garden runtime use three bits for encoding their type:

C
 1typedef enum {
 2  V_NONE,   // zero value, comparable to rusts Option::None
 3  V_STR,    // string view
 4  V_DOUBLE, // floating point number
 5  V_INT,    // integer
 6
 7  V_TRUE,   // booleans
 8  V_FALSE,
 9
10  V_ARRAY,  // containers / adts
11  V_OBJ,
12} ValueType;

A union for storing the data for each type (excluding true and false, since these do not require further data). Str for the custom string abstraction, List for the dynamically growing array, Map for the hash map, a double and an int64_t.

C
 1typedef struct Value {
 2  unsigned int type : 3;
 3  union {
 4    Str string;
 5    List *array;
 6    Map *obj;
 7    double floating;
 8    int64_t integer;
 9  };
10} Value;

And a singular bit for marking a value as optional: is_some, making each type optionally representable:

C
 1typedef struct Value {
 2  unsigned int type : 3;
 3  unsigned int is_some : 1;
 4  union {
 5    Str string;
 6    List *array;
 7    Map *obj;
 8    double floating;
 9    int64_t integer;
10  };
11} Value;

Using a bit tag on Value itself results in the valid states of all types except V_NONE combined with is_some equal to true and false. While the type V_NONE can only be combined with is_some being false. This invariant is checked in the runtime and there is a specific function for determining whether an instance of Value is optionally available:

C
1inline bool Value_is_opt(const Value *v) {
2  return v->type == V_NONE || v->is_some;
3}

The big advantage of this approach is enabling a zero allocation, zero copy and zero indirection optional implementation in the runtime, since each and all values can be turned into an optional value by setting the Value.is_some bit. If optionality were implemented via V_OPTION and with Value.inner being the value while Value.is_some would indicate if said option were holding something, this would always require an allocation, due to the recursive nature of this implementation.

NIX
1# optionals
2var opt_none = std::None()
3var opt_some = std::Some([])
4std::opt::or(opt_none "anything else") # -> "anything else"
5std::opt::unwrap(opt_some) # -> []
6std::opt::expect(opt_some "would panic with this message") # -> []
7std::opt::is_some(opt_some) # -> true 
8std::opt::is_none(opt_none) # -> true 

Standard library functions interacting with optionals are in std::opt and many functions in other packages are returning or accepting optional values. Functions in this package are trivial to implement with this value design, for instance std::opt::{some, none, or}:

C
 1static void pg_builtin_opt_some(Vm *vm) {
 2  Value inner = ARG(0);
 3  inner.is_some = true;
 4  RETURN(inner);
 5}
 6
 7// INTERNED_NONE is a static Value instance thats always available to the runtime
 8
 9static void pg_builtin_opt_none(Vm *vm) { RETURN(*INTERNED_NONE); }
10
11static void pg_builtin_opt_or(Vm *vm) {
12  Value lhs = ARG(0);
13  Value rhs = ARG(1);
14  ASSERT(Value_is_opt(&lhs), "Or: lhs wasnt an Optional");
15  if (!lhs.is_some) {
16    RETURN(rhs);
17  } else {
18    lhs.is_some = false;
19    RETURN(lhs);
20  }
21}

The design also allows for numeric Value instances without allocations, trading smaller Value sizes for less heap allocations in hot paths of the runtime.

Differentiating heap and non-heap values

For purple garden, neither booleans, doubles or integers require heap allocation. Strings which are known at compile time are simply windows into the initial input the interpreter gets, therefore not heap allocated via gc_alloc and not subject to marking or collection.

Thus a marker for distinguishing between heap values and non heap values is necessary: Value.is_heap. Only values with this bit toggled are considered for marking, collection and passed to both the copy and rewrite phases of the garbage collector.

C
 1typedef struct Value {
 2  unsigned int is_heap : 1;
 3  unsigned int type : 3;
 4  union {
 5    Str string;
 6    List *array;
 7    Map *obj;
 8    double floating;
 9    int64_t integer;
10  };
11} Value;

Dynamically growable bump allocation

Info - Growable bump allocators

For an in depth commentary on bump allocation, growable chunks, mmap and segmented lists build on top of these concepts, see the C part of my recent article: Porting a Segmented List From C to Rust: C Implementation.

Since allocations are expensive due to them requiring a system interaction via syscalls, purple garden is designed to reduce syscalls as much as possible. To achieve this, a growing bump allocator is used. Upon getting a request the bump allocator cant satisfy, the allocator itself requests a new chunk of memory from the operating system. This chunk is double the size of the previous one, while the first chunk is as large as the page size of the system it is running on.

String abstraction

Since c style strings suck due to them not having a length associated at all times, purple garden has a string abstraction built-in. It iterates on the computationally expensive c string interactions by making hashing and length first class citizen. Str is a view into a buffer it doesn’t manage, it holds a pointer to a buffer, a length and a hash. Its backing storage is immutable. Str is cheap to copy and therefore kept by value, not by reference in Value.

C
1typedef struct __Str {
2  uint64_t hash;
3  uint32_t len;
4  const uint8_t *p;
5} Str;

Str.length is computed at creation time. A static string that’s know when compiling the runtime (things like error messages or string representations of value type names), is passed to the STRING macro:

C
1#define STRING(str)                                                            \
2  ((Str){.len = sizeof(str) - 1, .p = (const uint8_t *)str, .hash = 0})

For strings included in the input to the runtime, the runtime executes. A different approach is used, for instance, consider something like:

NIX
1std::println("Hello" "World")

In this example both sizes and hash of "Hello" and "World" are known as soon as the lexer determines them to be string tokens. Abusing this fact and the fact, that all characters of a string have to be walked, both the length and the hash are computed while the lexer recognises strings:

C
 1// ! error handling omitted for clarity
 2string: {
 3    // skip "
 4    l->pos++;
 5    size_t start = l->pos;
 6    uint64_t hash = FNV_OFFSET_BASIS;
 7
 8    char cc = l->input.p[l->pos];
 9    for (; cc > 0 && cc != '"'; l->pos++, cc = l->input.p[l->pos]) {
10        hash ^= cc;
11        hash *= FNV_PRIME;
12    }
13
14    Token *t = CALL(a, request, sizeof(Token));
15    *t = (Token){0};
16    t->type = T_STRING;
17    t->string = (Str){
18        .p = l->input.p + start,
19        .len = l->pos - start,
20        .hash = hash,
21    };
22
23    l->pos++;
24    return t;
25}

Hashing is done via FNV-1a. Computing hashes of strings created at runtime, a complementary Str_hash function is provided:

C
1inline uint64_t Str_hash(const Str *str) {
2  uint64_t hash = FNV_OFFSET_BASIS;
3  for (size_t i = 0; i < str->len; i++) {
4    hash ^= (uint64_t)str->p[i];
5    hash *= FNV_PRIME;
6  }
7  return hash;
8}

There is a small internal API built on top of this abstraction:

C
1char Str_get(const Str *str, size_t index);
2Str Str_from(const char *s);
3Str Str_slice(const Str *str, size_t start, size_t end);
4Str Str_concat(const Str *a, const Str *b, Allocator *alloc);
5bool Str_eq(const Str *a, const Str *b);
6void Str_debug(const Str *str);
7int64_t Str_to_int64_t(const Str *str);
8double Str_to_double(const Str *str);

While the str standard library package exposes str::append, str::lines and str::slice:

NIX
1std::str::append("Hello" " World") # Hello World
2std::str::slice("Hello" 0 2) # hel
3std::str::slice("Hello" 1 2) # el
4std::str::lines("hello\nworld") # [hello world]

Triggering garbage collection

Since allocation latency is more important for short live scripts, triggering and checking for the memory usage to surpass the garbage collection threshold on allocation or worse, on each bytecode instruction can slow the virtual machine down substantially and thus introduce latency in both execution and allocation speeds.

To circumvent this, the threshold check is only performed after leaving a scope, this has the benefit of omitting all objects one would have to consider as alive in the scope before the vm left said scope. Triggering the gc cycle by piggy backing on control flow and memory usage makes the gc behaviour easier to reason about.

C
 1    case OP_RET: {
 2      Frame *old = vm->frame;
 3      if (vm->frame->prev) {
 4        vm->pc = vm->frame->return_to_bytecode;
 5        vm->frame = vm->frame->prev;
 6      }
 7
 8      // [...] free lists for stack frames and stuff
 9
10      if (!vm->config.disable_gc &&
11          vm->gc->allocated >= (vm->config.gc_size * vm->config.gc_limit)) {
12        gc_cycle(vm->gc);
13      }
14      break;
15    }

The runtime standard library exposes a gc package for both std::runtime::gc::stats and triggering a garbage collection cycle manually: std::runtime::gc::cycle(), as shown in the initial example at the start of this article. The implementation and usage for the latter is shown below.

C
1  if (!vm->config.disable_gc) {
2    gc_cycle(vm->gc);
3  }
4}
NIX
1var combination = std::str::append("Hello" " " "World")
2std::println(combination)
3std::runtime::gc::cycle()

Marking a Value

As previously explained, only values with is_heap should be considered roots and to be managed by the gc. Thus mark has an early exit condition:

C
1static inline void mark(Gc *gc, const Value *val) {
2  if (!val || !val->is_heap) {
3    return;
4  }
5
6  // [...]
7}

Since Str wraps the underlying gc managed buffer (GC_OBJ_RAW) while not being allocated itself, this has to be handled differently from V_ARRAY and V_OBJ:

C
 1static inline void mark(Gc *gc, const Value *val) {
 2  // [...]
 3
 4  void *payload = NULL;
 5  switch ((ValueType)val->type) {
 6  case V_STR:
 7    GcHeader *raw = (GcHeader *)((uint8_t *)val->string.p - sizeof(GcHeader));
 8    raw->marked = true;
 9    break;
10    return;
11  case V_ARRAY:
12    payload = (void *)val->array;
13    break;
14  case V_OBJ:
15    payload = (void *)val->obj;
16    break;
17  default:
18    return;
19  }
20
21  // [...]
22}

The payload of arrays and objects is allocated, thus they are casted and assigned. If the resulting payload is less than sizeof(GcHeader) either a heap corruption or a gc bug occured, thus we assert. Then an exit condition for already marked headers is created.

C
 1static inline void mark(Gc *gc, const Value *val) {
 2  // [...]
 3
 4  ASSERT((uintptr_t)payload > sizeof(GcHeader),
 5         "payload too small, GC logic bug, this shouldnt happen");
 6
 7  GcHeader *h = (GcHeader *)((char *)payload - sizeof(GcHeader));
 8  if (!h || h->marked) {
 9    return;
10  }
11
12  h->marked = 1;
13
14  // [...]

Since we already handled a string, we can now switch on the GcHeader->type bits and recursively mark each member of an array and each value of an object.

C
 1static inline void mark(Gc *gc, const Value *val) {
 2  // [...]
 3
 4  switch ((ObjType)h->type) {
 5  case GC_OBJ_LIST: {
 6    List *l = (List *)payload;
 7    for (size_t i = 0; i < l->len; i++) {
 8      mark(gc, &l->arr[i]);
 9    }
10    break;
11  }
12  case GC_OBJ_MAP:
13    Map *m = (Map *)payload;
14    for (size_t i = 0; i < m->cap; i++) {
15      MapEntry e = m->buckets[i];
16      mark(gc, &e.value);
17    }
18  default:
19    return;
20  }
21}

Letting mark loose on registers and call frames

Since we need to mark our root set in gc_cycle and our root set consists of both the registers, the variables in the current and all previous call frames, we need to call mark on each:

C
 1void gc_cycle(Gc *gc) {
 2  if (!gc->allocated_since_last_cycle) {
 3    return;
 4  }
 5  gc->allocated_since_last_cycle = 0;
 6  Vm *vm = ((Vm *)gc->vm);
 7  for (size_t i = 0; i < REGISTERS; i++) {
 8    const Value *ri = vm->registers + i;
 9    mark(gc, ri);
10  }
11
12  for (Frame *f = vm->frame; f; f = f->prev) {
13    for (size_t i = 0; i < f->variable_table.cap; i++) {
14      MapEntry *me = &f->variable_table.buckets[i];
15      if (me->hash) {
16        mark(gc, &me->value);
17      }
18    }
19  }
20
21  // [...]
22}

Copying marked GcHeader

After marking each Value and GcHeader referenced by the root set, we copy those to the to-space / new allocator. We also rebuild the GcHeader chain of copied headers.

C
 1void gc_cycle(Gc *gc) {
 2    // [...]
 3
 4  GcHeader *new_head = NULL;
 5  size_t new_alloc = 0;
 6  for (GcHeader *h = gc->head; h; h = h->next) {
 7    if (!h->marked) {
 8      continue;
 9    }
10
11    void *buf = CALL(gc->new, request, h->size + sizeof(GcHeader));
12    GcHeader *nh = (GcHeader *)buf;
13    void *new_payload = (char *)buf + sizeof(GcHeader);
14    memcpy(nh, h, sizeof(GcHeader) + h->size);
15    nh->next = new_head;
16    new_head = nh;
17    h->forward = (uintptr_t)new_payload;
18    nh->forward = 0;
19    nh->marked = 0;
20    new_alloc += h->size;
21  }
22
23    // [...]
24}

Forwarding pointers and Rewriting references

mark and rewrite_nested are subject to blowing the stack up, both will use a work list in the future

Each GcHeader.forward is then used to update all references to the newly copied regions:

C
 1static inline void *forward_ptr(void *payload) {
 2  if (!payload) {
 3    return NULL;
 4  }
 5
 6  GcHeader *old = (GcHeader *)((char *)payload - sizeof(GcHeader));
 7  if (!old) {
 8    // not a gc object, hitting this is a bug
 9    unreachable();
10  }
11
12  if (old->type < GC_OBJ_RAW || old->type > GC_OBJ_MAP) {
13    // either already in newspace or not a heap object; return payload unchanged
14    return payload;
15  }
16
17  if (old->forward) {
18    return (void *)old->forward;
19  }
20
21  // normally this would be unreachable, but since pg doesnt clear registers
22  // after insertions into adts or the variable table, references to heap data
23  // can be both in the variable table and registers at the same time, thus
24  // allowing for multiple forward_ptr calls since there are multiple references
25  // to a single point in memory. This results in double forwarding and other
26  // shenanigans. Just returning the payload if no forward was found is correct
27  // and a fix.
28  return payload;
29}

rewrite is really as simple as it gets:

C
 1static inline void rewrite(Gc *gc, Value *v) {
 2  if (!v->is_heap) {
 3    return;
 4  }
 5
 6  switch ((ValueType)v->type) {
 7  case V_STR:
 8    v->string.p = (const uint8_t *)forward_ptr((void *)v->string.p);
 9    break;
10  case V_ARRAY:
11    v->array = (List *)forward_ptr((void *)v->array);
12    break;
13  case V_OBJ:
14    v->obj = (Map *)forward_ptr((void *)v->obj);
15    break;
16  default:
17    return;
18  }
19}

rewrite_nested iterates on rewrite by nesting the process for objects and arrays:

C
 1void rewrite_nested(Gc *gc, Value *v) {
 2  rewrite(gc, v);
 3
 4  switch (v->type) {
 5  case V_ARRAY:
 6    for (size_t i = 0; i < v->array->len; i++) {
 7      rewrite_nested(gc, &v->array->arr[i]);
 8    }
 9    break;
10  case V_OBJ:
11    for (size_t i = 0; i < v->obj->cap; i++) {
12      MapEntry *me = &v->obj->buckets[i];
13      if (me->hash) {
14        rewrite_nested(gc, &me->value);
15      }
16    }
17    break;
18  default:
19    break;
20  }
21}

Put together in gc_cycle, starting with the registers, each entry in each variable table of each frame and each header in the GcHeader chain, all Values are rewritten.

C
 1void gc_cycle(Gc *gc) {
 2  // [...]
 3
 4  for (size_t i = 0; i < REGISTERS; i++) {
 5    Value *ri = &vm->registers[i];
 6    rewrite(gc, ri);
 7  }
 8
 9  for (Frame *f = vm->frame; f; f = f->prev) {
10    for (size_t i = 0; i < f->variable_table.cap; i++) {
11      MapEntry *me = &f->variable_table.buckets[i];
12      if (me->hash) {
13        rewrite_nested(gc, &me->value);
14      }
15    }
16  }
17
18  for (GcHeader *h = new_head; h; h = h->next) {
19    switch (h->type) {
20    case GC_OBJ_LIST: {
21      List *l = (List *)((uint8_t *)h + sizeof(GcHeader));
22      for (size_t i = 0; i < l->len; i++) {
23        rewrite_nested(gc, &l->arr[i]);
24      }
25      break;
26    }
27    case GC_OBJ_MAP: {
28      Map *m = (Map *)((uint8_t *)h + sizeof(GcHeader));
29      for (size_t i = 0; i < m->cap; i++) {
30        MapEntry *me = &m->buckets[i];
31        if (me->hash) {
32          rewrite_nested(gc, &me->value);
33        }
34      }
35      break;
36    }
37    case GC_OBJ_STR: {
38      Str *str = (Str *)((uint8_t *)h + sizeof(GcHeader));
39      str->p = forward_ptr((void *)str->p);
40      break;
41    }
42    default:
43      break;
44    }
45  }
46
47  // [...]
48}

Bookkeeping after collection

C
1void gc_cycle(Gc *gc) {
2  // [...]
3  gc->head = new_head;
4  SWAP_STRUCT(gc->old, gc->new);
5  CALL(gc->new, reset);
6  gc->allocated = new_alloc;
7}

After mark, copy and rewrite, the chain is attached to Gc.head,

The from and to-space swap places via SWAP_STRUCT:

C
1#define SWAP_STRUCT(A, B)                                                      \
2  do {                                                                         \
3    _Static_assert(__builtin_types_compatible_p(typeof(A), typeof(B)),         \
4                   "SWAP_STRUCT arguments must have identical types");         \
5                                                                               \
6    typeof(A) __swap_tmp = (A);                                                \
7    (A) = (B);                                                                 \
8    (B) = __swap_tmp;                                                          \
9  } while (0)

The new (new, since the previous old allocator is now the new allocator) allocator is reset with CALL, which is a macro:

C
1#define CALL(SELF, METHOD, ...) (SELF)->METHOD((SELF)->ctx, ##__VA_ARGS__)

Expanding to (gc->new)->reset((gc->new)->ctx);.

Tuning

As the virtual machine is configurable via the VmConfig structure, so is the garbage collector:

C
 1typedef struct {
 2  // defines the maximum amount of memory purple garden is allowed to allocate,
 3  // if this is hit, the vm exits with a non zero code
 4  uint64_t max_memory;
 5  // define gc heap size in bytes
 6  uint64_t gc_size;
 7  // gc threshold in percent 5-99%
 8  double gc_limit;
 9  // disables the garbage collector
10  bool disable_gc;
11
12  // [...]
13} Vm_Config;

This configuration is passed to Vm_new and attached to the Vm structure. It allows for fully disabling the garbage collector, setting a total size for the gc heap, and a threshold of the gc heap size in percent at which the gc should start a cycle.

For non embedding purposes, specifically for simply running the interpreter, each option, its short name, its default value, its type and its description is defined in the list of cli flags the purple garden binary accepts:

C
1#define CLI_ARGS                                                               \
2  X(gc_max, 0, GC_MIN_HEAP * 64l, LONG,                                        \
3    "set hard max gc space in bytes, default is GC_MIN_HEAP*64")               \
4  X(gc_size, 0, GC_MIN_HEAP * 2l, LONG, "define gc heap size in bytes")        \
5  X(gc_limit, 0, 70.0, DOUBLE,                                                 \
6    "instruct memory usage amount for gc to start collecting, in percent "     \
7    "(5-99%)")                                                                 \
8  X(no_gc, 0, false, BOOL, "disable garbage collection")                       

Producing a nice help text via a modified 6cl library (omitted all other options for clarity):

TEXT
 1usage ./build/purple_garden_debug: [ +gc_max <1638400>] [ +gc_size <51200>]
 2                                   [ +gc_limit <70>] [ +no_gc]
 3                                   // [...]
 4                                   <file.garden>
 5
 6Option:
 7          +gc_max <1638400>
 8                set hard max gc space in bytes, default is GC_MIN_HEAP*64
 9          +gc_size <51200>
10                define gc heap size in bytes
11          +gc_limit <70>
12                instruct memory usage amount for gc to start collecting, in
13                percent (5-99%)
14          +no_gc
15                disable garbage collection
16          // [...]
17
18          +h/+help
19                help page and usage
20Examples:
21        ./build/purple_garden_debug +gc_max 1638400 +gc_size 51200 \
22                                    +gc_limit 0 +no_gc

If the user sets these, the main purple garden entry point uses these cli options to set the VmConfig:

C
1Vm vm = Vm_new(
2  (Vm_Config){
3      .gc_size = a.gc_size,
4      .gc_limit = a.gc_limit,
5      .disable_gc = a.no_gc,
6      .max_memory = a.gc_max,
7      .disable_std = a.no_std,
8      .no_env = a.no_env,
9  }, pipeline_allocator, &gc);

Comparing mgc to other gc idioms and other reasoning

The obvious question at this point is of purpose and complexity. Why design a garbage collector leveraging multiple paradigms when seemingly each paradigm on its own could suffice?

Why Bump allocation instead of malloc

First and foremost, keeping allocations fast is crucial for this kind of runtime. For this only bump allocation is an option, simply because syscalls are slow, multiple for many small objects are even slower and hot paths would explode. Due to purple garden being optimised for allocation and not deallocation, an alternative to bump allocation is out of the question.

Why Copying, Compacting and resetting whole Heap regions instead of single value dealloc

Since this restricts the collection of objects due to bump allocation disallowing the deallocation of single objects, a copy to-space, from-space collection approach is the most evident. Copying also has the benefit of bulk reclamation and reduced fragmentation, especially compared to single-object deallocation.

Why Register and variable tracing vs heap scanning

The root set is absolute and known at all times. Knowing these values determining reachability is the most efficient implementation, especially compared to walking the whole gc heap. Why waste cycles for scanning the heap if all root values are known at all times and headers are chained?

Why not X

On the other side of the initial question is a list of concepts some garbage collectors of famous runtimes employ. These were purposely omitted, both for keeping the garbage collector safe from feature creep and due to the runtime being single thread, the gc being stop the world and in general me disallowing concurrent mutations of any state, especially the gc state and root set:

  • write barriers/incremental: Only necessary for concurrent collectors, but purple garden is single-threaded and stop-the-world. All objects are known at collection time.
  • generational: optimised for long-lived objects, but purple-garden is an embeddable scripting language, it is designed and engineered for being cheap and quick to spin up for something like a plugin programming behaviour running on each event or entity in a game. It is specifically targeted for short lived execution and thus allocations.
  • RC/ARC: large runtime overhead and cycles need extra non trivial detection logic, which would slow hot paths down
  • escape analysis: optimises for reducing heap allocation, but mgc makes allocations extra cheap, plus purple gardens bytecode is compiled from its abstract syntax tree into a compact format, there is no immediate representation with enough lifetime info for performing escape analysis

Gc invariants and assumptions

This is a short list included for completeness, violating one of these invariants results in undefined behavior.

  1. depends on registers not being cleared
  2. values not being duplicated incorrectly
  3. Value.type=V_STR and Value.is_heap=1 to mean the string is heap allocated, else its ‘allocated’ by the lexer (technically by mmaping the input into process memory, but you get the gist)
  4. vm discipline around setting Value.is_heap

(non-)Portability to Rust and Design Improvements

I’m currently in the process of rewriting purple garden in Rust for a plethora of reasons (these are not that easy to understand if one hasn’t worked on a language runtime before):

  • less abstractions, since I had to hand roll hashing, hash maps, arrays, array lists, which I can just replace with std::collection

  • better and shorter bytecode compiler, since rust just allows me to Vec<Op> with better append and inserts than in my implementations

  • better bytecode format and less wasted bytes for each instruction, since this allows me to have things like a single byte for RET and multiple bytes for things like ADD rA, rB via:

    RUST
     1enum Op<'vm> {
     2    Add {
     3        dst: u8,
     4        lhs: u8,
     5        rhs: u8,
     6    },
     7    // [...]
     8    Ret {
     9        /// used for peephole optimisation, merging multiple RET into a single
    10        /// RET with a count
    11        times: u8,
    12    },
    13}
  • way better error handling in each component of the runtime, since the current C interpreter just aborts via assertions

  • easier compile time value interning via rust hash maps (I had three hash maps for three different types), now I just use std::collections::HashMap

  • the ARGS instruction for encoding both register offset and count for arguments to builtin and user defined function calls is no longer necessary, since both SYS and CALL encode their offset and argument count in their instruction:

    RUST
     1enum Op<'vm> {
     2    // [...]
     3    Call {
     4        func: u16,
     5        args_start: u8,
     6        args_len: u8,
     7    },
     8    Sys {
     9        ptr: BuiltinFn<'vm>,
    10        args_start: u8,
    11        args_len: u8,
    12    },
    13}
  • Builtin functions for SYS no longer require type erasure, casting, indirection in the compiler, since I wasn’t able to store a 64bit pointer in a word, so the compiler created an array of pointers to a builtin and the bytecode encoded an index into said array the vm could use to call the correct builtin, this is now just a fn pointer, see above for the reference in the bytecode and below for the definition:

    RUST
    1type BuiltinFn<'vm> = fn(&mut Vm<'vm>, &[Value]);