I like Gos headless switch statements as a replacement for if-if-else-else chains:
1import ("time"; "fmt")
2t := time.Now()
3switch {
4case t.Hour() < 12:
5 fmt.Println("Good morning!")
6case t.Hour() < 17:
7 fmt.Println("Good afternoon.")
8default:
9 fmt.Println("Good evening.")
10}So i decided purple garden will have these as the singular control structure, for instance the above would be the following:
1import ("time" "io")
2let t = time.now()
3let greeting = match {
4 t.hour() < 12 { "morning" }
5 t.hour() < 17 { "afternoon" }
6 { "evening" }
7}
8io.println("Good" greeting)Match statments
Below is the easiest and most useless match statement there is, for converting a boolean to its integer representation:
1match {
2 true { 1 }
3 { 0 }
4}The general format is a conditional case evaluating to a boolean and a body. All bodies must resolve to the same type and a default branch is required.
Pipeline Architecture
Purple gardens architecture revolves around an intermediate representation based on a list of functions holding a list of blocks. Each block has a list of inputs params, a list of instructions and a singular terminator. Said instructions are SSA based and the blocks containing them are basic blocks, meaning each value is defined immutability and exactly once. This also means params to blocks and params in terminators are explicit (this enables ommission of phi nodes).
The IR sits in the intersection of the abstract syntax tree produced by parsing the tokenized input and the three backends (currently only the bytecode backend targeting the typed register based virtula machine is implemented). This architecture enables decoupled codegen and a list of optimisations.
Parsing
Parsing consumes the tokens produced by the lexical analysis / tokenisation and builds a tree representing the source code as a concept.
1// as called in main()
2let mut lexer = Lexer::new(&input);
3let ast = match Parser::new(&mut lexer).and_then(|n| n.parse()) {
4 Ok(a) => a,
5 Err(e) => {
6 let lines = str::from_utf8(&input)
7 .unwrap()
8 .lines()
9 .collect::<Vec<&str>>();
10 e.render(&lines);
11 std::process::exit(1);
12 }
13};As shown in the intro, the match stmt follows the following format:
1match {
2 <cond> { <body> }
3 { <default> }
4}
5
6Expr ::= ...
7Block ::= "{" Expr "}"
8MatchStmt ::= "match" "{" (Expr Block)+ Block "}Pg uses a combination of recursive descent and pratt parsing. I will focus on the former here, since the latter doesnt apply.
1match + Parser::parser
2match \_ Parser::parser
3match \_ Parser::parse_prefix
4match \_ Parser::parse_match
5True |\_ Parser::parse_expr
6 | |
7I("1") | \_ Parser::parse_prefix
8I("1") | \_ Parser::parse_expr
9 |
10I("0") \_ Parser::parse_prefix
11I("0") \_ Parser::parse_exprAs shown above, the call stack for our example shows all function calls necessary to build the abstract syntax tree:
1(match
2 (True I("1"))
3 (I("0"))
4)Below I included the implementation of Parser::parse_match:
1 fn parse_match(&mut self) -> Result<Node<'p>, PgError> {
2 self.next()?;
3 let mut cases = vec![];
4 let mut default = None;
5 let tok = self.cur().clone();
6
7 self.expect(Type::CurlyLeft)?;
8 while self.cur().t != Type::CurlyRight {
9 /// default case
10 if self.cur().t == Type::CurlyLeft {
11 let default_token = self.cur().clone();
12 self.expect(Type::CurlyLeft);
13 let mut default_body = vec![];
14 while self.cur().t != Type::CurlyRight {
15 default_body.push(self.parse_prefix()?);
16 }
17 self.expect(Type::CurlyRight);
18 default = Some((default_token, default_body));
19 } else {
20 let condition_token = self.cur().clone();
21 let condition = self.parse_expr(0)?;
22 self.expect(Type::CurlyLeft);
23 let mut body = vec![];
24 while self.cur().t != Type::CurlyRight {
25 body.push(self.parse_prefix()?);
26 }
27 self.expect(Type::CurlyRight);
28 cases.push(((condition_token, condition), body));
29 }
30 }
31 self.expect(Type::CurlyRight)?;
32
33 let Some(default) = default else {
34 return Err(PgError::with_msg(
35 "Missing match default branch",
36 "A match statement requires a default branch",
37 &tok,
38 ));
39 };
40
41 Ok(Node::Match {
42 id: self.next_id(),
43 cases,
44 default,
45 })
46 }Typechecking
1// just before lowering to IR in Lower::ir_from
2let mut typechecker = typecheck::Typechecker::new();
3for node in ast {
4 let t = typechecker.node(node)?;
5 crate::trace!("{} resolved to {:?}", &node, t);
6}
7self.types = typechecker.finalise();The purple garden type system is primitive, non-generic and based on equality. This design enables a single pass type checker with a very simple environment and an even simpler caching of already computed types.
For a match statment, the typechecker:
- Makes sure all conditions resolve to a
bool - Makes sure all branches evaluates to the same type
- This type is then recorded as the canonical type for this match statment
Thus in a tracing build, the typechecker prints:
1[34.475µs] (match
2 (
3 True
4 I("1")
5 )
6 (
7 I("0")
8 )
9)
10 resolved to Int
11[59.101µs] Finished type checkingNot conforming to the previously layed out constraints results in a pretty printed error diagnostic:
1-> err: Non bool match condition
2 Match conditions must be Bool, got Int instead
3
4008 | match {
5009 | 5 { 1 }
6 ~ here
7010 | { 0 }Not only for non bool conditions, but also for differing types in different branches:
1-> err: Incompatible match case return type
2 Match cases must resolve to the same type, but got Int and Bool
3
4008 | match {
5009 | true { false }
6 ~ here
7010 | { 5 }The implementation is as easy as it sounds, it follows the steps mentioned before.
1Node::Match { id, cases, default } => {
2 // short circuit for empty matches
3 if cases.is_empty() {
4 return Ok(Type::Void);
5 }
6
7 let case_count = cases.len();
8
9 let mut branch_types: Vec<Option<(&Token, Type)>> =
10 vec![const { None }; case_count];
11
12 // 1.
13 for (i, ((condition_token, condition), body)) in cases.iter().enumerate() {
14 let condition_type: Type = self.node(condition)?;
15
16 // 1. check for condition
17 if condition_type != Type::Bool {
18 return Err(PgError::with_msg(
19 "Non bool match condition",
20 format!(
21 "Match conditions must be Bool, got {} instead",
22 condition_type
23 ),
24 condition_token,
25 ));
26 }
27
28 // 2. collect type of the body
29 let branch_return_type = self.block_type(body)?;
30 branch_types[i] = Some((condition_token, branch_return_type));
31 }
32
33 // 2. canonical type is the type the default body resolves to
34 let first_type = self.block_type(&default.1)?;
35
36 // 2. check the types are all the same
37 for cur in &branch_types {
38 let Some((tok, ty)) = cur else { unreachable!() };
39
40 if ty != &first_type {
41 return Err(PgError::with_msg(
42 "Incompatible match case return type",
43 format!(
44 "Match cases must resolve to the same type, but got {} and {}",
45 first_type, ty
46 ),
47 *tok,
48 ));
49 };
50 }
51
52 // 3. record the resulting type
53 self.map.insert(*id, first_type.clone());
54
55 // 3. propagate to the caller
56 first_type
57}Lowering to BB SSA IR
1// as called in main()
2let lower = ir::lower::Lower::new();
3let mut ir = match lower.ir_from(&ast) {
4 Ok(ir) => ir,
5 Err(e) => {
6 let lines = str::from_utf8(&input)
7 .unwrap()
8 .lines()
9 .collect::<Vec<&str>>();
10 e.render(&lines);
11 std::process::exit(1);
12 }
13};The intermediate representation, as introduced in Pipeline Architecture, is based on basic blocks and static single assignment. This means control flow is made up of blocks with lists of instructions and are terminated explicitly. Those instructions are, again, ssa based. This means every instruction produces exactly a single operation and is only defined once.
In rust type terms, this represents as:
1pub struct Block<'b> {
2 // [...]
3 pub id: Id,
4 pub instructions: Vec<Instr<'b>>,
5 pub params: Vec<Id>,
6 pub term: Option<Terminator>,
7}
8
9pub struct Func<'f> {
10 pub name: &'f str,
11 pub id: Id,
12 pub ret: Option<Type>,
13 pub blocks: Vec<Block<'f>>,
14}Thus in a human readable sense we get:
1// entry
2fn f0() -> void {
3b0():
4b1():
5 %v0:Bool = true
6 br %v0, b2(), b3()
7b2():
8 %v1:Int = 1
9 jmp b4(%v1)
10b3():
11 %v2:Int = 0
12 jmp b4(%v2)
13b4(%v2):
14}Lowering the AST to the IR requires allocation a list of blocks for each
condition (b1), and a list of blocks for each body (b2), including the
default body (b3). It also requires a joining block (b4).
Each condition is lowered into its block and each body as well. All conditions
followed by another condition are terminated by a Terminator::Branch jumping
conditionally to its body or to the next condition. All bodies are terminated
by Terminator::Jump to jump to the joining block:
1pub struct Lower<'lower> {
2 functions: Vec<Func<'lower>>,
3 /// current function
4 func: Func<'lower>,
5 /// current block
6 block: Id,
7 id_store: IdStore,
8 /// maps ast variable names to ssa values
9 env: HashMap<&'lower str, Id>,
10 func_name_to_id: HashMap<&'lower str, Id>,
11 types: HashMap<usize, ptype::Type>,
12}
13
14impl<'lower> Lower<'lower> {
15 // [...]
16
17 fn lower_node(&mut self, node: &'lower Node) -> Result<Option<Id>, PgError> {
18 Ok(match node {
19 // [...]
20 Node::Match { cases, default, id } => {
21 let mut check_blocks = Vec::with_capacity(cases.len());
22 let mut body_blocks = Vec::with_capacity(cases.len());
23
24 // pre"allocating" bbs
25 for _ in cases {
26 check_blocks.push(self.new_block());
27 body_blocks.push(self.new_block());
28 }
29
30 let params = self.cur().params.clone();
31
32 let default_block = self.new_block();
33
34 // the single join block, merging all value results into a single branch
35 let join = self.new_block();
36
37 for (i, ((_, condition), body)) in cases.iter().enumerate() {
38 self.switch_to_block(check_blocks[i]);
39 let Some(cond) = self.lower_node(condition)? else {
40 unreachable!(
41 "Compiler bug, match cases MUST have a condition returning a value"
42 );
43 };
44
45 let no_target = if i + 1 < cases.len() {
46 check_blocks[i + 1]
47 } else {
48 default_block
49 };
50
51 let check_block_mut = self.block_mut(check_blocks[i]);
52 check_block_mut.term = Some(Terminator::Branch {
53 cond,
54 yes: (body_blocks[i], params.clone()),
55 no: (no_target, params.clone()),
56 });
57 check_block_mut.params = params.clone();
58
59 self.switch_to_block(body_blocks[i]);
60 self.block_mut(body_blocks[i]).params = params.clone();
61 let mut last = None;
62 for node in body {
63 last = self.lower_node(node)?;
64 }
65 let value = last.expect("match body must produce value");
66
67 self.block_mut(body_blocks[i]).term = Some(Terminator::Jump {
68 id: join,
69 params: vec![value],
70 });
71 }
72
73 // the typechecker checked we have a default case, so this is safe
74 let (_, body) = default;
75 self.switch_to_block(default_block);
76 let mut last = None;
77 for node in body.iter() {
78 last = self.lower_node(node)?;
79 }
80 let mut default_block = self.block_mut(default_block);
81 default_block.params = params;
82 let last = last.expect("match default must produce value");
83 default_block.term = Some(Terminator::Jump {
84 id: join,
85 params: vec![last],
86 });
87
88 self.switch_to_block(join);
89 self.block_mut(join).params = vec![last];
90 Some(last)
91 }
92 })
93 }
94}lower_node is called by Lower::ir_from: Creating an entry point function,
creating an entry block in this function and then lowering each node
individually.
1pub fn ir_from(mut self, ast: &'lower [Node]) -> Result<Vec<Func<'lower>>, PgError> {
2 // [...] typechecking
3
4 self.func = Func {
5 id: Id(0),
6 name: "entry",
7 ret: None,
8 blocks: vec![],
9 };
10 let entry = self.new_block();
11 self.switch_to_block(entry);
12
13 for node in ast {
14 let _ = &self.lower_node(node)?;
15 // reset to the main entry point block to keep emitting nodes into the correct conext
16 self.switch_to_block(entry);
17 }
18
19 self.functions.push(self.func);
20 Ok(self.functions)
21}Lowering to Bytecode
Lowering the immediate representation to bytecode the virtual machine can execute works on a function by function and block by block basis.
1// as called in main()
2let mut cc = bc::Cc::new();
3if let Err(e) = cc.compile(&ir) {
4 let lines = str::from_utf8(&input)
5 .unwrap()
6 .lines()
7 .collect::<Vec<&str>>();
8 e.render(&lines);
9 std::process::exit(1);
10};We can now use the IR blocks and generate bytecode for each block.
1// entry
2fn f0() -> void {
3b0():
4b1():
5 %v0:Bool = true
6 br %v0, b2(), b3()
7b2():
8 %v1:Int = 1
9 jmp b4(%v1)
10b3():
11 %v2:Int = 0
12 jmp b4(%v2)
13b4(%v2):
14}I have annotated the resulting bytecode instruction disassembly with the corresponding immediate representations instruction:
1globals:
2 0000: true
3
400000000 <entry>:
5; b1:
6 0000: load_global r0, 1
7 ; br %v0, b2(), b3()
8 0001: jmpf r0, 3 <entry+0x3>
9 0002: jmp 6 <entry+0x6>
10
11; b2:
12 ; %v1:Int = 1
13 0003: load_imm r1, #1
14 ; jmp b4(%v1)
15 0004: mov r2, r1
16 0005: jmp 8 <entry+0x8>
17
18; b3:
19 ; %v2:Int = 0
20 0006: load_imm r2, #0
21 ; jmp b4(%v1)
22 0007: jmp 8 <entry+0x8>Emitting functions and blocks
Since the IRs root construct is a function containing blocks, the bytecode backend starts by iterating functions and blocks in functions. For each block it then emits bytecode for instructions and bytecode for terminators.
1pub struct Cc<'cc> {
2 pub buf: Vec<Op>,
3 pub ctx: Context<'cc>,
4 /// binding a block id to its pc
5 block_map: HashMap<ir::Id, u16>,
6 /// prefilled block id to block
7 blocks: HashMap<ir::Id, &'cc ir::Block<'cc>>,
8}
9
10impl<'cc> Cc<'cc> {
11 // [...]
12
13 fn cc(&mut self, fun: &'cc Func<'cc>)
14 -> Result<Option<reg::Reg>, PgError> {
15 // [...]
16 for block in &fun.blocks {
17 // [...]
18
19 for instruction in &block.instructions {
20 // emit bytecode for each instruction
21 self.instr(instruction);
22 }
23
24 // emit bytecode for each blocks terminator
25 self.term(block.term.as_ref());
26 }
27
28 // [...]
29 }
30}Emitting instructions
Since in this example there is only LoadConst for true, 1 and 0, there
is a fairly uncomplicated implementation extract for Cc::instr.
1// purple_garden::ir
2
3/// Compile time Value representation, used for interning and constant
4/// propagation
5pub enum Const<'c> {
6 False,
7 True,
8 Int(i64),
9 Double(u64),
10 Str(&'c str),
11}
12
13pub struct Id(pub u32);
14pub struct TypeId {
15 pub id: Id,
16 pub ty: Type,
17}
18pub enum Instr<'i> {
19 // [...]
20 LoadConst { dst: TypeId, value: Const<'i> },
21}Since LoadConst is fully typechecked, emitting bytecode for it is a matter of
checking if the constant is an integer and fits into i32::MAX, since the vm
does have a loadimm instruction.
1// purple_garden::bc
2
3fn instr(&mut self, i: &ir::Instr<'cc>) {
4 match i {
5 ir::Instr::LoadConst { dst, value } => {
6 let TypeId {
7 id: ir::Id(dst), ..
8 } = dst;
9
10 match value {
11 Const::Int(i) if *i < i32::MAX as i64 => {
12 self.emit(Op::LoadI {
13 dst: *dst as u8,
14 value: *i as i32,
15 });
16 }
17 _ => {
18 let idx = self.ctx.intern(*value);
19 self.emit(Op::LoadG {
20 dst: *dst as u8,
21 idx,
22 });
23 }
24 }
25 }
26 };
27}All other constants are interned via Context::intern. Which just makes sure
the virtual machines global pool doesnt include duplicate values.
1pub struct Context<'ctx> {
2 // [...]
3 pub globals: HashMap<Const<'ctx>, usize>,
4 pub globals_vec: Vec<Const<'ctx>>,
5}
6
7impl<'ctx> Context<'ctx> {
8 pub fn intern(&mut self, constant: Const<'ctx>) -> u32 {
9 if let Some(&idx) = self.globals.get(&constant) {
10 return idx as u32;
11 }
12
13 let idx = self.globals_vec.len();
14 if let Const::Str(str) = constant {
15 let str_pool_idx = self.strings_vec.len() as i64;
16 self.strings_vec.push(str);
17 self.globals_vec.push(Const::Int(str_pool_idx));
18 } else {
19 self.globals_vec.push(constant);
20 };
21
22 self.globals.insert(constant, idx);
23 idx as u32
24 }
25}So for our instructions:
1%v0:Bool = true
2%v1:Int = 1
3%v2:Int = 0Compiles to this bytecode:
1load_global r0, 1
2load_imm r1, #1
3load_imm r2, #0Emitting terminators
Same as before, simply for another immediate representation construct:
1pub enum Terminator {
2 // [...]
3 Jump {
4 id: Id,
5 params: Vec<Id>,
6 },
7 Branch {
8 cond: Id,
9 yes: (Id, Vec<Id>),
10 no: (Id, Vec<Id>),
11 },
12}This maps to bytecode as well as the instructions, but with a bit of a preamble for the params for each.
1fn term(&mut self, t: Option<&ir::Terminator>) {
2 let Some(term) = t else {
3 return;
4 };
5
6 match term {
7 // [...]
8 ir::Terminator::Jump { id, params } => {
9 let target = *self.blocks.get(id).unwrap();
10 for (i, param) in params.iter().enumerate() {
11 let ir::Id(src) = param;
12 let ir::Id(dst) = target.params[i];
13
14 if *src == dst {
15 continue;
16 }
17
18 self.emit(Op::Mov {
19 dst: dst as u8,
20 src: *src as u8,
21 });
22 }
23
24 let ir::Id(id) = id;
25 self.emit(Op::Jmp { target: *id as u16 });
26 }
27 ir::Terminator::Branch {
28 cond,
29 yes: (yes, yes_params),
30 no: (no, no_params),
31 ..
32 } => {
33 let target = *self.blocks.get(yes).unwrap();
34 for (i, param) in yes_params.iter().enumerate() {
35 let ir::Id(src) = param;
36 let ir::Id(dst) = target.params[i];
37
38 if *src == dst {
39 continue;
40 }
41
42 self.emit(Op::Mov {
43 dst: dst as u8,
44 src: *src as u8,
45 });
46 }
47
48 let ir::Id(cond) = cond;
49 self.emit(Op::JmpF {
50 cond: *cond as u8,
51 target: yes.0 as u16,
52 });
53
54 let target = self.blocks[no];
55 for (i, param) in no_params.iter().enumerate() {
56 let ir::Id(src) = param;
57 let ir::Id(dst) = target.params[i];
58
59 if *src == dst {
60 continue;
61 }
62
63 self.emit(Op::Mov {
64 dst: dst as u8,
65 src: *src as u8,
66 });
67 }
68
69 self.emit(Op::Jmp {
70 target: no.0 as u16,
71 });
72 }
73 }
74}Real, but easy, example: factorial
Factorial is easy enough to reason about, implement, and its recursive, which is nice to debug backtracing and some other vm features:
$$ n! := \begin{cases} 1 & \textrm{if } n = 0 \\ n \cdot (n-1)! & \textrm{if } n >= 1 \end{cases} $$I “only” want to compute the first 20 values, since purple gardens integers are
represented as i64, so the largest fitting factorial is
2,432,902,008,176,640,000, corresponding to 20.
1fn factorial(n:int a:int) int {
2 match {
3 n == 0 { a }
4 { factorial(n-1 n*a) }
5 }
6}
7factorial(20 1)The corresponding AST amounts to:
1(fn factorial (n:int a:int)
2 (match
3 ((== n 0) a)
4 ((factorial (- n 1) (* n a)))))->int
5(factorial 20 1)Lowered to the immediate representation as:
1// factorial
2fn f1(%v0, %v1) -> Int {
3b0(%v0, %v1):
4b1(%v0, %v1):
5 %v2:Int = 0
6 %v3:Bool = eq %v0, %v2
7 br %v3, b2(%v0, %v1), b3(%v0, %v1)
8b2(%v0, %v1):
9 jmp b4(%v1)
10b3(%v0, %v1):
11 %v4:Int = 1
12 %v5:Int = sub %v0, %v4
13 %v6:Int = mul %v0, %v1
14 %v7 = f1(%v5, %v6)
15 jmp b4(%v7)
16b4(%v7):
17 ret %v7
18}
19
20// entry
21fn f0() -> void {
22b0():
23 %v0:Int = 20
24 %v1:Int = 1
25 %v2 = f1(%v0, %v1)
26}Again, lowered to bytecode, results in:
100000000 <factorial>:
2 0000: load_imm r2, #0
3 0001: eq r3, r0, r2
4 0002: jmpf r3, 4 <factorial+0x2>
5 0003: jmp 6 <factorial+0x6>
6 0004: mov r7, r1
7 0005: jmp 14 <factorial+0xE>
8 0006: load_imm r4, #1
9 0007: sub r5, r0, r4
10 0008: mul r6, r0, r1
11 0009: mov r0, r5
12 000a: mov r1, r6
13 000b: call 0 <factorial>
14 000c: mov r7, r0
15 000d: jmp 14 <factorial+0xE>
16 000e: mov r0, r7
17 000f: ret
18
1900000010 <entry>:
20 0010: load_imm r0, #20
21 0011: load_imm r1, #1
22 0012: call 0 <factorial>
23 0013: mov r2, r0Adding dbg!(vm.r[0].as_int()); to the main after vm.run(), shows the
correct output:
1[src/main.rs:265:5] vm.r[0].as_int() = 2432902008176640000Compiling with release options and stuff results in a fairly quick pipeline (~700 microseconds), but thats just a micro benchmark for a uselessly simple function:
1$ hyperfine "./target/release/purple-garden f.garden" -N --warmup 10
2Benchmark 1: ./target/release/purple-garden f.garden
3 Time (mean ± σ): 703.6 µs ± 28.5 µs [User: 296.2 µs, System: 354.1 µs]
4 Range (min … max): 657.1 µs … 944.7 µs 3630 runsOptimisations
There are a lot of low hanging fruit in these examples (useless / noop blocks,
function call in tailcall position, unnecessary moves), this chapter glosses
over concepts, implementation and effects for some of them, for instance
indirect_jump and tailcall:
1// purple_garden::opt
2
3pub fn ir(ir: &mut [crate::ir::Func]) {
4 for fun in ir {
5 ir::indirect_jump(fun);
6 ir::tailcall(fun);
7 }
8}Similar to the peephole optimisations I did
previously,
the ir optimisations are also guarded behind -O1:
1fn main() {
2 // [...]
3
4 if args.opt >= 1 {
5 opt::ir(&mut ir);
6 }
7
8 // [...]
9}Removing Useless Blocks
The indirect_jump optimisation removes blocks doing nothing except terminate
into another block, for instance b2 in factorial:
1fn f1(%v0, %v1) -> Int {
2b0(%v0, %v1):
3b1(%v0, %v1):
4 %v2:Int = 0
5 %v3:Bool = eq %v0, %v2
6 br %v3, b2(%v0, %v1), b3(%v0, %v1)
7b2(%v0, %v1):
8 jmp b4(%v1)
9b3(%v0, %v1):
10 %v4:Int = 1
11 %v5:Int = sub %v0, %v4
12 %v6:Int = mul %v0, %v1
13 %v7 = f1(%v5, %v6)
14 jmp b4(%v7)
15b4(%v7):
16 ret %v7
17}b2has no instructionsb2has an unconditional terminatorb2s terminators target is another blockb2is not the function entry
Thus it can be fully omited, requiring the branch terminator pointing to b2
to point instead to b4:
1 b1(%v0, %v1):
2 %v2:Int = 0
3 %v3:Bool = eq %v0, %v2
4- br %v3, b2(%v0, %v1), b3(%v0, %v1)
5+ br %v3, b4(%v1), b3(%v0, %v1)
6 b2(%v0, %v1):
7- jmp b4(%v1)
8+ <tombstone>
9 b3(%v0, %v1):
10 %v4:Int = 1
11 %v5:Int = sub %v0, %v4
The tombstone is a marker for the codegen backends to skip generating code for a ‘dead’ block and enables stable block ids, which are useful for codegen and further optimisations on alive blocks.
1// purple_garden::ir
2pub struct Block<'b> {
3+ /// block is dead as a result of optimisation passes
4+ pub tombstone: bool,
5 pub id: Id,
6 pub instructions: Vec<Instr<'b>>,
7 pub params: Vec<Id>,
8 pub term: Option<Terminator>,
9}
1// purple_garden::bc
2impl<'cc> Cc<'cc> {
3 fn cc(&mut self, fun: &'cc Func<'cc>)
4 -> Result<Option<reg::Reg>, PgError> {
5 // [...] prep
6
7 for block in &fun.blocks {
8+ if block.tombstone {
9+ continue;
10+ }
11
12 // [...] codegen
13 }
14}
Here is where rust shines, a pretty pattern match on a blocks terminator, extracting its targets and parameters. Pattern matching again, this time on the edges of the terminator (fancy speak for the terminators), to check if they are in indirect jumping positions and then rewriting either yes or no, or both if any of the target blocks are.
1pub fn indirect_jump(fun: &mut ir::Func) {
2 for i in 0..fun.blocks.len() {
3 let Some(ir::Terminator::Branch {
4 cond,
5 yes: (ir::Id(yes), yes_params),
6 no: (ir::Id(no), no_params),
7 ..
8 }) = fun.blocks[i].term.clone()
9 else {
10 continue;
11 };
12
13 let yes_target = &mut fun.blocks[yes as usize];
14 let yes_edge = if yes_target.instructions.is_empty() {
15 if let Some(ir::Terminator::Jump { id, params }) = &yes_target.term {
16 yes_target.tombstone = true;
17 Some((*id, params.clone()))
18 } else {
19 None
20 }
21 } else {
22 None
23 };
24
25 let no_target = &mut fun.blocks[no as usize];
26 let no_edge = if no_target.instructions.is_empty() {
27 if let Some(ir::Terminator::Jump { id, params }) = &no_target.term {
28 no_target.tombstone = true;
29 Some((*id, params.clone()))
30 } else {
31 None
32 }
33 } else {
34 None
35 };
36
37 fun.blocks[i].term = Some(ir::Terminator::Branch {
38 cond,
39 yes: yes_edge.unwrap_or((ir::Id(yes), yes_params)),
40 no: no_edge.unwrap_or((ir::Id(no), no_params)),
41 });
42 }
43}Tail call optimisation (FUTURE)
Since factorial with an accumulator is embarrassingly tailcallable1, we need a pass to make it.
This and the below section subject for the next blog article.
Smarter register usage (FUTURE)
In our factorial example there are a few obvious cases in which instructions
could write to registers directly instead of writing to temporary registers and
moving their results to the respective register afterwards:
1 0007: sub r5, r0, r4
2 0008: mul r6, r0, r1
3 0009: mov r0, r5
4 000a: mov r1, r6And also unnecessary moves upon crossing block boundaries:
1 000c: mov r7, r0
2 000d: jmp 14 <factorial+0xE>
3 000e: mov r0, r7It even is THE example when looking into LLVMs tailcall pass: https://gist.github.com/vzyrianov/19cad1d2fdc2178c018d79ab6cd4ef10#examples ↩︎