You should write your own programming language

· 956 words · 5 minute read

I recently started writing an interpreted programming language inspired by lisp. Mainly because its easy to parse the expressions :^). I named it after my girlfriend and got started. Its designed around single character operators, such as .:+-*/^%.

After implementing the first few features I wanted to somewhat express the fun i had while writing this obscure language in this post while explaining why I think everyone should at least try to implement a subset of a programming language. Even if the language is, like mine, a little bit esoteric.

Without going into too much detail, here is a short showcase of the language:

 1;; print arguments to stdout
 2[. "hello world"]
 5    [+
 6        [* 2 2]
 7        2
 8        3
 9    ]
10    [^ 5
11        [- 2 1]
12    ]
14;; 9 10
16[: pi 3.1415]
18    [* pi 5]
20;; 15.7075000000000001

I know its a pain to read or write. It also has a lot of quirks and weird design decisions:

  • so many braces
  • no S-Expressions, but M-Expressions
  • no functions
  • somehow everything is an array and simultaneously isn’t
  • lisp without linked lists
  • only pretty errors in the lexer, everything else is just an error message:
 1sophia -exp '[. "hello world]'
 2# 13:47:45 err: Unterminated String at [l 1:3]
 4# 001 |   [. "test]
 5#            ^^^^^^
 6# 13:49:47 lexer error
 8sophia -exp '[? "test"]'
 9# 13:48:37 err: Unknown token '?' at [l 1:1]
11# 001 |   [? "test"]
12#          ^
13# 13:49:47 lexer error
15sophia -exp '[. "test"'
16# 13:49:47 err: Missing statement end
17#   - Expected Token ']' got 'EOF' [l: 0:0]
18# 13:49:47 parser error
20sophia -exp '[+ 1 "b"]'
21# 13:50:46 err: can not use variable
22#   of type string in current operation (+)
23#   , expected float64 for value "b"
24# 13:50:46 runtime error
  • no control flow
  • only strings and floats (float64)
  • variables only store the first argument after the identifier (don’t @me, the parser isn’t done yet)
  • i stole the . from forth
  • expressions (everything between []) are evaluated right to left, for instance the power operator is evaluated like so:
    1. Assign first child of the power expression to a variable called result
    2. Raise the result to the power of the next child and assign the result to result
    3. Profit (not really)
  • all operators (the first token in []) are single character

As one can see my to-do list is full for the next few months.

Why should i write a programming language ##

I think the benefit of implementing a programming language or even a subset like I did can be categorized into the following chapters.

Of course lets not forget the joy of writing a programming language, you provide your idea of an instruction set and the computer executes and prints what you specified.

Programming language experience ###

In my opinion i learned a lot about go and java (yeah i know 🤢) while working on several interpreters. You can learn a lot about iterators, the standard library, string and character manipulation as well as i/o when implementing your own programming language.

Some books and resources i found very helpful for learning go:

Data structures ###

You will need to store a list of tokens in a list which size you can’t know at compile time, therefore you will need a variable size list, such as a Slice or an ArrayList.

You also have to keep track of built-in keywords or variables you encounter while parsing, both of these use cases cry for a map or a HashMap.

You probably are implementing a tree walk interpreter, for this to work you are building a tree out of scanned tokens, which also is a data structure.

Not only will you have to use data structures when implementing your programming language, but you will have to understand and use them correctly to have a somewhat good time.

For data structures I can recommend the course The Last Algorithms Course You’ll Need by ThePrimeagen or A Programmer’s Guider to Computer Science Vol.1

Design patterns ###

No I am not talking about SOLID, dependency injection or implementing the abstract BuilderFactory, I’m referring to the useful design patters, such as:

I use the iterator pattern to iterate over tokens and ast-Nodes. I make use of the visitor pattern to invoke the .Eval() function on the root of the AST, which will then evaluate all nodes until the leaf is hit. This creates a clean and nice looking code base and keeps the mental overhead low by separating creation and evaluation.

Understanding the basics ###

The thing I love most about tinkering with implementing a programming language is the theory behind it. Lexical and syntax analysis, parsing the token stream to an abstract syntax tree and evaluating it is just a lot of fun.

When implementing a programming language you will learn about:

  • lexical analysis (assigning meaning to characters)
  • syntax analysis (checking if the lexed / scanned stream matches the grammar you desire)
  • building a tree of nodes which represent the structure of your source program
  • tree traversal to evaluate this structure
  • a lot of error handling
    • errors when lexing (unknown characters, not terminated strings, etc)
    • errors when parsing (braces at the wrong place, missing semicolon, etc.)
    • errors when evaluating (trying to do arithmetics with strings, calling undefined functions or using undefined variables)

For writing interpreters i can recommend:

For reading about the theory behind writing programming languages i can recommend the dragon book: