Compiling Match Statements to Bytecode

Tags:

I like Gos headless switch statements as a replacement for if-if-else-else chains:

GO
 1import ("time"; "fmt")
 2t := time.Now()
 3switch {
 4case t.Hour() < 12:
 5    fmt.Println("Good morning!")
 6case t.Hour() < 17:
 7    fmt.Println("Good afternoon.")
 8default:
 9    fmt.Println("Good evening.")
10}

So i decided purple garden will have these as the singular control structure, for instance the above would be the following:

PYTHON
1import ("time" "io")
2let t = time.now()
3let greeting = match {
4    t.hour() < 12 { "morning" }
5    t.hour() < 17 { "afternoon" }
6    { "evening" }
7}
8io.println("Good" greeting)

Match statments

Below is the easiest and most useless match statement there is, for converting a boolean to its integer representation:

PYTHON
1match {
2    true { 1 }
3    { 0 }
4}

The general format is a conditional case evaluating to a boolean and a body. All bodies must resolve to the same type and a default branch is required.

Pipeline Architecture

Purple gardens architecture revolves around an intermediate representation based on a list of functions holding a list of blocks. Each block has a list of inputs params, a list of instructions and a singular terminator. Said instructions are SSA based and the blocks containing them are basic blocks, meaning each value is defined immutability and exactly once. This also means params to blocks and params in terminators are explicit (this enables ommission of phi nodes).

The IR sits in the intersection of the abstract syntax tree produced by parsing the tokenized input and the three backends (currently only the bytecode backend targeting the typed register based virtula machine is implemented). This architecture enables decoupled codegen and a list of optimisations.

Parsing

Parsing consumes the tokens produced by the lexical analysis / tokenisation and builds a tree representing the source code as a concept.

RUST
 1// as called in main()
 2let mut lexer = Lexer::new(&input);
 3let ast = match Parser::new(&mut lexer).and_then(|n| n.parse()) {
 4    Ok(a) => a,
 5    Err(e) => {
 6        let lines = str::from_utf8(&input)
 7            .unwrap()
 8            .lines()
 9            .collect::<Vec<&str>>();
10        e.render(&lines);
11        std::process::exit(1);
12    }
13};

As shown in the intro, the match stmt follows the following format:

TEXT
1match {
2    <cond> { <body> }
3    { <default> }
4}
5
6Expr      ::= ...
7Block     ::= "{" Expr "}"
8MatchStmt ::= "match" "{" (Expr Block)+ Block "}

Pg uses a combination of recursive descent and pratt parsing. I will focus on the former here, since the latter doesnt apply.

TEXT
 1match   + Parser::parser
 2match    \_ Parser::parser
 3match     \_ Parser::parse_prefix
 4match      \_ Parser::parse_match
 5True       |\_ Parser::parse_expr
 6           | |
 7I("1")     |  \_ Parser::parse_prefix
 8I("1")     |   \_ Parser::parse_expr
 9           |
10I("0")      \_ Parser::parse_prefix
11I("0")       \_ Parser::parse_expr

As shown above, the call stack for our example shows all function calls necessary to build the abstract syntax tree:

LISP
1(match
2 (True I("1"))
3 (I("0"))
4)

Below I included the implementation of Parser::parse_match:

RUST
 1    fn parse_match(&mut self) -> Result<Node<'p>, PgError> {
 2        self.next()?;
 3        let mut cases = vec![];
 4        let mut default = None;
 5        let tok = self.cur().clone();
 6
 7        self.expect(Type::CurlyLeft)?;
 8        while self.cur().t != Type::CurlyRight {
 9            /// default case
10            if self.cur().t == Type::CurlyLeft {
11                let default_token = self.cur().clone();
12                self.expect(Type::CurlyLeft);
13                let mut default_body = vec![];
14                while self.cur().t != Type::CurlyRight {
15                    default_body.push(self.parse_prefix()?);
16                }
17                self.expect(Type::CurlyRight);
18                default = Some((default_token, default_body));
19            } else {
20                let condition_token = self.cur().clone();
21                let condition = self.parse_expr(0)?;
22                self.expect(Type::CurlyLeft);
23                let mut body = vec![];
24                while self.cur().t != Type::CurlyRight {
25                    body.push(self.parse_prefix()?);
26                }
27                self.expect(Type::CurlyRight);
28                cases.push(((condition_token, condition), body));
29            }
30        }
31        self.expect(Type::CurlyRight)?;
32
33        let Some(default) = default else {
34            return Err(PgError::with_msg(
35                "Missing match default branch",
36                "A match statement requires a default branch",
37                &tok,
38            ));
39        };
40
41        Ok(Node::Match {
42            id: self.next_id(),
43            cases,
44            default,
45        })
46    }

Typechecking

RUST
1// just before lowering to IR in Lower::ir_from 
2let mut typechecker = typecheck::Typechecker::new();
3for node in ast {
4    let t = typechecker.node(node)?;
5    crate::trace!("{} resolved to {:?}", &node, t);
6}
7self.types = typechecker.finalise();

The purple garden type system is primitive, non-generic and based on equality. This design enables a single pass type checker with a very simple environment and an even simpler caching of already computed types.

For a match statment, the typechecker:

  1. Makes sure all conditions resolve to a bool
  2. Makes sure all branches evaluates to the same type
  3. This type is then recorded as the canonical type for this match statment

Thus in a tracing build, the typechecker prints:

TEXT
 1[34.475µs] (match
 2 (
 3  True
 4  I("1")
 5 )
 6 (
 7  I("0")
 8 )
 9)
10 resolved to Int
11[59.101µs] Finished type checking

Not conforming to the previously layed out constraints results in a pretty printed error diagnostic:

TEXT
1-> err: Non bool match condition
2   Match conditions must be Bool, got Int instead
3
4008 | match {
5009 |     5 { 1 }
6            ~ here
7010 |     { 0 }

Not only for non bool conditions, but also for differing types in different branches:

TEXT
1-> err: Incompatible match case return type
2   Match cases must resolve to the same type, but got Int and Bool
3
4008 | match {
5009 |     true { false }
6               ~ here
7010 |     { 5 }

The implementation is as easy as it sounds, it follows the steps mentioned before.

RUST
 1Node::Match { id, cases, default } => {
 2    // short circuit for empty matches
 3    if cases.is_empty() {
 4        return Ok(Type::Void);
 5    }
 6
 7    let case_count = cases.len();
 8
 9    let mut branch_types: Vec<Option<(&Token, Type)>> =
10        vec![const { None }; case_count];
11
12    // 1. 
13    for (i, ((condition_token, condition), body)) in cases.iter().enumerate() {
14        let condition_type: Type = self.node(condition)?;
15
16        // 1. check for condition
17        if condition_type != Type::Bool {
18            return Err(PgError::with_msg(
19                "Non bool match condition",
20                format!(
21                    "Match conditions must be Bool, got {} instead",
22                    condition_type
23                ),
24                condition_token,
25            ));
26        }
27
28        // 2. collect type of the body
29        let branch_return_type = self.block_type(body)?;
30        branch_types[i] = Some((condition_token, branch_return_type));
31    }
32
33    // 2. canonical type is the type the default body resolves to
34    let first_type = self.block_type(&default.1)?;
35
36    // 2. check the types are all the same
37    for cur in &branch_types {
38        let Some((tok, ty)) = cur else { unreachable!() };
39
40        if ty != &first_type {
41            return Err(PgError::with_msg(
42                "Incompatible match case return type",
43                format!(
44                    "Match cases must resolve to the same type, but got {} and {}",
45                    first_type, ty
46                ),
47                *tok,
48            ));
49        };
50    }
51
52    // 3. record the resulting type
53    self.map.insert(*id, first_type.clone());
54
55    // 3. propagate to the caller
56    first_type
57}

Lowering to BB SSA IR

RUST
 1// as called in main()
 2let lower = ir::lower::Lower::new();
 3let mut ir = match lower.ir_from(&ast) {
 4    Ok(ir) => ir,
 5    Err(e) => {
 6        let lines = str::from_utf8(&input)
 7            .unwrap()
 8            .lines()
 9            .collect::<Vec<&str>>();
10        e.render(&lines);
11        std::process::exit(1);
12    }
13};

The intermediate representation, as introduced in Pipeline Architecture, is based on basic blocks and static single assignment. This means control flow is made up of blocks with lists of instructions and are terminated explicitly. Those instructions are, again, ssa based. This means every instruction produces exactly a single operation and is only defined once.

In rust type terms, this represents as:

RUST
 1pub struct Block<'b> {
 2    // [...]
 3    pub id: Id,
 4    pub instructions: Vec<Instr<'b>>,
 5    pub params: Vec<Id>,
 6    pub term: Option<Terminator>,
 7}
 8
 9pub struct Func<'f> {
10    pub name: &'f str,
11    pub id: Id,
12    pub ret: Option<Type>,
13    pub blocks: Vec<Block<'f>>,
14}

Thus in a human readable sense we get:

TEXT
 1// entry
 2fn f0() -> void {
 3b0():
 4b1():
 5        %v0:Bool = true
 6        br %v0, b2(), b3()
 7b2():
 8        %v1:Int = 1
 9        jmp b4(%v1)
10b3():
11        %v2:Int = 0
12        jmp b4(%v2)
13b4(%v2):
14}

Lowering the AST to the IR requires allocation a list of blocks for each condition (b1), and a list of blocks for each body (b2), including the default body (b3). It also requires a joining block (b4).

Each condition is lowered into its block and each body as well. All conditions followed by another condition are terminated by a Terminator::Branch jumping conditionally to its body or to the next condition. All bodies are terminated by Terminator::Jump to jump to the joining block:

RUST
 1pub struct Lower<'lower> {
 2    functions: Vec<Func<'lower>>,
 3    /// current function
 4    func: Func<'lower>,
 5    /// current block
 6    block: Id,
 7    id_store: IdStore,
 8    /// maps ast variable names to ssa values
 9    env: HashMap<&'lower str, Id>,
10    func_name_to_id: HashMap<&'lower str, Id>,
11    types: HashMap<usize, ptype::Type>,
12}
13
14impl<'lower> Lower<'lower> {
15    // [...]
16
17    fn lower_node(&mut self, node: &'lower Node) -> Result<Option<Id>, PgError> {
18        Ok(match node {
19            // [...]
20            Node::Match { cases, default, id } => {
21                let mut check_blocks = Vec::with_capacity(cases.len());
22                let mut body_blocks = Vec::with_capacity(cases.len());
23
24                // pre"allocating" bbs
25                for _ in cases {
26                    check_blocks.push(self.new_block());
27                    body_blocks.push(self.new_block());
28                }
29
30                let params = self.cur().params.clone();
31
32                let default_block = self.new_block();
33
34                // the single join block, merging all value results into a single branch
35                let join = self.new_block();
36
37                for (i, ((_, condition), body)) in cases.iter().enumerate() {
38                    self.switch_to_block(check_blocks[i]);
39                    let Some(cond) = self.lower_node(condition)? else {
40                        unreachable!(
41                            "Compiler bug, match cases MUST have a condition returning a value"
42                        );
43                    };
44
45                    let no_target = if i + 1 < cases.len() {
46                        check_blocks[i + 1]
47                    } else {
48                        default_block
49                    };
50
51                    let check_block_mut = self.block_mut(check_blocks[i]);
52                    check_block_mut.term = Some(Terminator::Branch {
53                        cond,
54                        yes: (body_blocks[i], params.clone()),
55                        no: (no_target, params.clone()),
56                    });
57                    check_block_mut.params = params.clone();
58
59                    self.switch_to_block(body_blocks[i]);
60                    self.block_mut(body_blocks[i]).params = params.clone();
61                    let mut last = None;
62                    for node in body {
63                        last = self.lower_node(node)?;
64                    }
65                    let value = last.expect("match body must produce value");
66
67                    self.block_mut(body_blocks[i]).term = Some(Terminator::Jump {
68                        id: join,
69                        params: vec![value],
70                    });
71                }
72
73                // the typechecker checked we have a default case, so this is safe
74                let (_, body) = default;
75                self.switch_to_block(default_block);
76                let mut last = None;
77                for node in body.iter() {
78                    last = self.lower_node(node)?;
79                }
80                let mut default_block = self.block_mut(default_block);
81                default_block.params = params;
82                let last = last.expect("match default must produce value");
83                default_block.term = Some(Terminator::Jump {
84                    id: join,
85                    params: vec![last],
86                });
87
88                self.switch_to_block(join);
89                self.block_mut(join).params = vec![last];
90                Some(last)
91            }
92        })
93    }
94}

lower_node is called by Lower::ir_from: Creating an entry point function, creating an entry block in this function and then lowering each node individually.

RUST
 1pub fn ir_from(mut self, ast: &'lower [Node]) -> Result<Vec<Func<'lower>>, PgError> {
 2    // [...] typechecking
 3
 4    self.func = Func {
 5        id: Id(0),
 6        name: "entry",
 7        ret: None,
 8        blocks: vec![],
 9    };
10    let entry = self.new_block();
11    self.switch_to_block(entry);
12
13    for node in ast {
14        let _ = &self.lower_node(node)?;
15        // reset to the main entry point block to keep emitting nodes into the correct conext
16        self.switch_to_block(entry);
17    }
18
19    self.functions.push(self.func);
20    Ok(self.functions)
21}

Lowering to Bytecode

Lowering the immediate representation to bytecode the virtual machine can execute works on a function by function and block by block basis.

RUST
 1// as called in main()
 2let mut cc = bc::Cc::new();
 3if let Err(e) = cc.compile(&ir) {
 4    let lines = str::from_utf8(&input)
 5        .unwrap()
 6        .lines()
 7        .collect::<Vec<&str>>();
 8    e.render(&lines);
 9    std::process::exit(1);
10};

We can now use the IR blocks and generate bytecode for each block.

TEXT
 1// entry
 2fn f0() -> void {
 3b0():
 4b1():
 5        %v0:Bool = true
 6        br %v0, b2(), b3()
 7b2():
 8        %v1:Int = 1
 9        jmp b4(%v1)
10b3():
11        %v2:Int = 0
12        jmp b4(%v2)
13b4(%v2):
14}

I have annotated the resulting bytecode instruction disassembly with the corresponding immediate representations instruction:

ASM
 1globals:
 2  0000:    true
 3
 400000000 <entry>:
 5; b1:
 6  0000:    load_global r0, 1
 7           ; br %v0, b2(), b3()
 8  0001:    jmpf r0, 3 <entry+0x3>
 9  0002:    jmp 6 <entry+0x6>
10
11; b2:
12           ; %v1:Int = 1
13  0003:    load_imm r1, #1
14           ; jmp b4(%v1)
15  0004:    mov r2, r1
16  0005:    jmp 8 <entry+0x8>
17
18; b3: 
19           ; %v2:Int = 0
20  0006:    load_imm r2, #0
21           ; jmp b4(%v1)
22  0007:    jmp 8 <entry+0x8>

Emitting functions and blocks

Since the IRs root construct is a function containing blocks, the bytecode backend starts by iterating functions and blocks in functions. For each block it then emits bytecode for instructions and bytecode for terminators.

RUST
 1pub struct Cc<'cc> {
 2    pub buf: Vec<Op>,
 3    pub ctx: Context<'cc>,
 4    /// binding a block id to its pc
 5    block_map: HashMap<ir::Id, u16>,
 6    /// prefilled block id to block
 7    blocks: HashMap<ir::Id, &'cc ir::Block<'cc>>,
 8}
 9
10impl<'cc> Cc<'cc> {
11    // [...]
12
13    fn cc(&mut self, fun: &'cc Func<'cc>) 
14        -> Result<Option<reg::Reg>, PgError> {
15        // [...]
16        for block in &fun.blocks {
17            // [...]
18
19            for instruction in &block.instructions {
20                // emit bytecode for each instruction
21                self.instr(instruction);
22            }
23
24            // emit bytecode for each blocks terminator
25            self.term(block.term.as_ref());
26        }
27
28        // [...]
29    }
30}

Emitting instructions

Since in this example there is only LoadConst for true, 1 and 0, there is a fairly uncomplicated implementation extract for Cc::instr.

RUST
 1// purple_garden::ir
 2
 3/// Compile time Value representation, used for interning and constant
 4/// propagation
 5pub enum Const<'c> {
 6    False,
 7    True,
 8    Int(i64),
 9    Double(u64),
10    Str(&'c str),
11}
12
13pub struct Id(pub u32);
14pub struct TypeId {
15    pub id: Id,
16    pub ty: Type,
17}
18pub enum Instr<'i> {
19    // [...]
20    LoadConst { dst: TypeId, value: Const<'i> },
21}

Since LoadConst is fully typechecked, emitting bytecode for it is a matter of checking if the constant is an integer and fits into i32::MAX, since the vm does have a loadimm instruction.

RUST
 1// purple_garden::bc
 2
 3fn instr(&mut self, i: &ir::Instr<'cc>) {
 4    match i {
 5        ir::Instr::LoadConst { dst, value } => {
 6            let TypeId {
 7                id: ir::Id(dst), ..
 8            } = dst;
 9
10            match value {
11                Const::Int(i) if *i < i32::MAX as i64 => {
12                    self.emit(Op::LoadI {
13                        dst: *dst as u8,
14                        value: *i as i32,
15                    });
16                }
17                _ => {
18                    let idx = self.ctx.intern(*value);
19                    self.emit(Op::LoadG {
20                        dst: *dst as u8,
21                        idx,
22                    });
23                }
24            }
25        }
26    };
27}

All other constants are interned via Context::intern. Which just makes sure the virtual machines global pool doesnt include duplicate values.

RUST
 1pub struct Context<'ctx> {
 2    // [...]
 3    pub globals: HashMap<Const<'ctx>, usize>,
 4    pub globals_vec: Vec<Const<'ctx>>,
 5}
 6
 7impl<'ctx> Context<'ctx> {
 8    pub fn intern(&mut self, constant: Const<'ctx>) -> u32 {
 9        if let Some(&idx) = self.globals.get(&constant) {
10            return idx as u32;
11        }
12
13        let idx = self.globals_vec.len();
14        if let Const::Str(str) = constant {
15            let str_pool_idx = self.strings_vec.len() as i64;
16            self.strings_vec.push(str);
17            self.globals_vec.push(Const::Int(str_pool_idx));
18        } else {
19            self.globals_vec.push(constant);
20        };
21
22        self.globals.insert(constant, idx);
23        idx as u32
24    }
25}

So for our instructions:

TEXT
1%v0:Bool = true
2%v1:Int = 1
3%v2:Int = 0

Compiles to this bytecode:

ASM
1load_global r0, 1
2load_imm r1, #1
3load_imm r2, #0

Emitting terminators

Same as before, simply for another immediate representation construct:

RUST
 1pub enum Terminator {
 2    // [...]
 3    Jump {
 4        id: Id,
 5        params: Vec<Id>,
 6    },
 7    Branch {
 8        cond: Id,
 9        yes: (Id, Vec<Id>),
10        no: (Id, Vec<Id>),
11    },
12}

This maps to bytecode as well as the instructions, but with a bit of a preamble for the params for each.

RUST
 1fn term(&mut self, t: Option<&ir::Terminator>) {
 2    let Some(term) = t else {
 3        return;
 4    };
 5
 6    match term {
 7        // [...]
 8        ir::Terminator::Jump { id, params } => {
 9            let target = *self.blocks.get(id).unwrap();
10            for (i, param) in params.iter().enumerate() {
11                let ir::Id(src) = param;
12                let ir::Id(dst) = target.params[i];
13
14                if *src == dst {
15                    continue;
16                }
17
18                self.emit(Op::Mov {
19                    dst: dst as u8,
20                    src: *src as u8,
21                });
22            }
23
24            let ir::Id(id) = id;
25            self.emit(Op::Jmp { target: *id as u16 });
26        }
27        ir::Terminator::Branch {
28            cond,
29            yes: (yes, yes_params),
30            no: (no, no_params),
31            ..
32        } => {
33            let target = *self.blocks.get(yes).unwrap();
34            for (i, param) in yes_params.iter().enumerate() {
35                let ir::Id(src) = param;
36                let ir::Id(dst) = target.params[i];
37
38                if *src == dst {
39                    continue;
40                }
41
42                self.emit(Op::Mov {
43                    dst: dst as u8,
44                    src: *src as u8,
45                });
46            }
47
48            let ir::Id(cond) = cond;
49            self.emit(Op::JmpF {
50                cond: *cond as u8,
51                target: yes.0 as u16,
52            });
53
54            let target = self.blocks[no];
55            for (i, param) in no_params.iter().enumerate() {
56                let ir::Id(src) = param;
57                let ir::Id(dst) = target.params[i];
58
59                if *src == dst {
60                    continue;
61                }
62
63                self.emit(Op::Mov {
64                    dst: dst as u8,
65                    src: *src as u8,
66                });
67            }
68
69            self.emit(Op::Jmp {
70                target: no.0 as u16,
71            });
72        }
73    }
74}

Real, but easy, example: factorial

Factorial is easy enough to reason about, implement, and its recursive, which is nice to debug backtracing and some other vm features:

$$ n! := \begin{cases} 1 & \textrm{if } n = 0 \\ n \cdot (n-1)! & \textrm{if } n >= 1 \end{cases} $$

I “only” want to compute the first 20 values, since purple gardens integers are represented as i64, so the largest fitting factorial is 2,432,902,008,176,640,000, corresponding to 20.

PYTHON
1fn factorial(n:int a:int) int {
2    match {
3        n == 0 { a }
4        { factorial(n-1 n*a) }
5    }
6}
7factorial(20 1)

The corresponding AST amounts to:

LISP
1(fn factorial (n:int a:int)
2  (match
3   ((== n 0) a)
4   ((factorial (- n 1) (* n a)))))->int
5(factorial 20 1)

Lowered to the immediate representation as:

TEXT
 1// factorial
 2fn f1(%v0, %v1) -> Int {
 3b0(%v0, %v1):
 4b1(%v0, %v1):
 5        %v2:Int = 0
 6        %v3:Bool = eq %v0, %v2
 7        br %v3, b2(%v0, %v1), b3(%v0, %v1)
 8b2(%v0, %v1):
 9        jmp b4(%v1)
10b3(%v0, %v1):
11        %v4:Int = 1
12        %v5:Int = sub %v0, %v4
13        %v6:Int = mul %v0, %v1
14        %v7 = f1(%v5, %v6)
15        jmp b4(%v7)
16b4(%v7):
17        ret %v7
18}
19
20// entry
21fn f0() -> void {
22b0():
23        %v0:Int = 20
24        %v1:Int = 1
25        %v2 = f1(%v0, %v1)
26}

Again, lowered to bytecode, results in:

ASM
 100000000 <factorial>:
 2  0000:    load_imm r2, #0
 3  0001:    eq r3, r0, r2
 4  0002:    jmpf r3, 4 <factorial+0x2>
 5  0003:    jmp 6 <factorial+0x6>
 6  0004:    mov r7, r1
 7  0005:    jmp 14 <factorial+0xE>
 8  0006:    load_imm r4, #1
 9  0007:    sub r5, r0, r4
10  0008:    mul r6, r0, r1
11  0009:    mov r0, r5
12  000a:    mov r1, r6
13  000b:    call 0 <factorial>
14  000c:    mov r7, r0
15  000d:    jmp 14 <factorial+0xE>
16  000e:    mov r0, r7
17  000f:    ret
18
1900000010 <entry>:
20  0010:    load_imm r0, #20
21  0011:    load_imm r1, #1
22  0012:    call 0 <factorial>
23  0013:    mov r2, r0

Adding dbg!(vm.r[0].as_int()); to the main after vm.run(), shows the correct output:

TEXT
1[src/main.rs:265:5] vm.r[0].as_int() = 2432902008176640000

Compiling with release options and stuff results in a fairly quick pipeline (~700 microseconds), but thats just a micro benchmark for a uselessly simple function:

TEXT
1$ hyperfine "./target/release/purple-garden f.garden" -N --warmup 10
2Benchmark 1: ./target/release/purple-garden f.garden
3  Time (mean ± σ):     703.6 µs ±  28.5 µs    [User: 296.2 µs, System: 354.1 µs]
4  Range (min … max):   657.1 µs … 944.7 µs    3630 runs

Optimisations

There are a lot of low hanging fruit in these examples (useless / noop blocks, function call in tailcall position, unnecessary moves), this chapter glosses over concepts, implementation and effects for some of them, for instance indirect_jump and tailcall:

RUST
1// purple_garden::opt
2
3pub fn ir(ir: &mut [crate::ir::Func]) {
4    for fun in ir {
5        ir::indirect_jump(fun);
6        ir::tailcall(fun);
7    }
8}

Similar to the peephole optimisations I did previously, the ir optimisations are also guarded behind -O1:

RUST
1fn main() {
2    // [...]
3
4    if args.opt >= 1 {
5        opt::ir(&mut ir);
6    }
7
8    // [...]
9}

Removing Useless Blocks

The indirect_jump optimisation removes blocks doing nothing except terminate into another block, for instance b2 in factorial:

TEXT
 1fn f1(%v0, %v1) -> Int {
 2b0(%v0, %v1):
 3b1(%v0, %v1):
 4        %v2:Int = 0
 5        %v3:Bool = eq %v0, %v2
 6        br %v3, b2(%v0, %v1), b3(%v0, %v1)
 7b2(%v0, %v1):
 8        jmp b4(%v1)
 9b3(%v0, %v1):
10        %v4:Int = 1
11        %v5:Int = sub %v0, %v4
12        %v6:Int = mul %v0, %v1
13        %v7 = f1(%v5, %v6)
14        jmp b4(%v7)
15b4(%v7):
16        ret %v7
17}
  • b2 has no instructions
  • b2 has an unconditional terminator
  • b2s terminators target is another block
  • b2 is not the function entry

Thus it can be fully omited, requiring the branch terminator pointing to b2 to point instead to b4:

DIFF
 1 b1(%v0, %v1):
 2         %v2:Int = 0
 3         %v3:Bool = eq %v0, %v2
 4-        br %v3, b2(%v0, %v1), b3(%v0, %v1)
 5+        br %v3, b4(%v1), b3(%v0, %v1)
 6 b2(%v0, %v1):
 7-        jmp b4(%v1)
 8+        <tombstone>
 9 b3(%v0, %v1):
10         %v4:Int = 1
11         %v5:Int = sub %v0, %v4

The tombstone is a marker for the codegen backends to skip generating code for a ‘dead’ block and enables stable block ids, which are useful for codegen and further optimisations on alive blocks.

DIFF
1// purple_garden::ir
2pub struct Block<'b> {
3+   /// block is dead as a result of optimisation passes
4+   pub tombstone: bool,
5    pub id: Id,
6    pub instructions: Vec<Instr<'b>>,
7    pub params: Vec<Id>,
8    pub term: Option<Terminator>,
9}
DIFF
 1// purple_garden::bc
 2impl<'cc> Cc<'cc> {
 3    fn cc(&mut self, fun: &'cc Func<'cc>) 
 4        -> Result<Option<reg::Reg>, PgError> {
 5        // [...] prep
 6
 7        for block in &fun.blocks {
 8+           if block.tombstone {
 9+               continue;
10+           }
11
12            // [...] codegen
13        }
14}

Here is where rust shines, a pretty pattern match on a blocks terminator, extracting its targets and parameters. Pattern matching again, this time on the edges of the terminator (fancy speak for the terminators), to check if they are in indirect jumping positions and then rewriting either yes or no, or both if any of the target blocks are.

RUST
 1pub fn indirect_jump(fun: &mut ir::Func) {
 2    for i in 0..fun.blocks.len() {
 3        let Some(ir::Terminator::Branch {
 4            cond,
 5            yes: (ir::Id(yes), yes_params),
 6            no: (ir::Id(no), no_params),
 7            ..
 8        }) = fun.blocks[i].term.clone()
 9        else {
10            continue;
11        };
12
13        let yes_target = &mut fun.blocks[yes as usize];
14        let yes_edge = if yes_target.instructions.is_empty() {
15            if let Some(ir::Terminator::Jump { id, params }) = &yes_target.term {
16                yes_target.tombstone = true;
17                Some((*id, params.clone()))
18            } else {
19                None
20            }
21        } else {
22            None
23        };
24
25        let no_target = &mut fun.blocks[no as usize];
26        let no_edge = if no_target.instructions.is_empty() {
27            if let Some(ir::Terminator::Jump { id, params }) = &no_target.term {
28                no_target.tombstone = true;
29                Some((*id, params.clone()))
30            } else {
31                None
32            }
33        } else {
34            None
35        };
36
37        fun.blocks[i].term = Some(ir::Terminator::Branch {
38            cond,
39            yes: yes_edge.unwrap_or((ir::Id(yes), yes_params)),
40            no: no_edge.unwrap_or((ir::Id(no), no_params)),
41        });
42    }
43}

Tail call optimisation (FUTURE)

Since factorial with an accumulator is embarrassingly tailcallable1, we need a pass to make it.

This and the below section subject for the next blog article.

Smarter register usage (FUTURE)

In our factorial example there are a few obvious cases in which instructions could write to registers directly instead of writing to temporary registers and moving their results to the respective register afterwards:

ASM
1  0007:    sub r5, r0, r4
2  0008:    mul r6, r0, r1
3  0009:    mov r0, r5
4  000a:    mov r1, r6

And also unnecessary moves upon crossing block boundaries:

ASM
1  000c:    mov r7, r0
2  000d:    jmp 14 <factorial+0xE>
3  000e:    mov r0, r7

  1. It even is THE example when looking into LLVMs tailcall pass: https://gist.github.com/vzyrianov/19cad1d2fdc2178c018d79ab6cd4ef10#examples ↩︎