Libjson is a JSON parsing/interaction library for which I attempted to go the not so gopher way: disregarding interoperability and backwards compatibility:
- libjson does not support writing json into structure fields via reflection
- libjson does not support accessing json via a structure, because: see point 1.
- libjson instead uses a Javascript like object access syntax
The three points above result in the following api:
1package main
2
3import (
4 "github.com/xnacly/libjson"
5)
6
7func main() {
8 input := `{ "hello": {"world": ["hi"] } }`
9 jsonObj, _ := New(input) // or libjson.NewReader(r io.Reader)
10
11 // accessing values
12 fmt.Println(Get[string](jsonObj, ".hello.world.0")) // hi
13}
Libjson is fast, but I want to make it even faster, thus i established a
baseline of the current results via ./bench.sh | rg "faster
- it already is
about 2x faster than encoding/json
.
12.10 ± 0.11 times faster than ./test -libjson=false ./1MB.json
22.33 ± 0.07 times faster than ./test -libjson=false ./5MB.json
32.37 ± 0.03 times faster than ./test -libjson=false ./10MB.json
Benchmark setup
I use:
- python to generate large json files for avoiding microbenchmarks
- hyperfine to compare the execution time of the program
- bash to put everything together
Generating input
View the source for the benchmarking code at xnacly/libjson/test
I decided on the following json object, to simulate the real world usage:
1[
2 {
3 "key1": "value",
4 "array": [],
5 "obj": {},
6 "atomArray": [11201, 1e112, true, false, null, "str"]
7 }
8]
I then write a small python script to generate three different file sizes (1mb, 5mb and 10mb) with the amount of objects in them to arrive at around these sizes:
1from os.path import exists
2import math
3
4sizes = [1,5,10]
5
6line = """\t{
7 "key1": "value",
8 "array": [],
9 "obj": {},
10 "atomArray": [11201,1e112,true,false,null,"str"]
11 }"""
12
13def write_data(size: int):
14 name = f"{size}MB.json"
15 if not exists(name):
16 with open(name, mode="w", encoding="utf8") as f:
17 f.write("[\n")
18 size = math.floor((size*1000000)/len(line))
19 f.write(",\n".join([line for _ in range(0, size)]))
20 f.write("\n]")
21
22[write_data(size) for size in sizes]
Creating a program
To call a program with hyperfine, we need to write one:
1package main
2
3import (
4 "encoding/json"
5 "flag"
6 "log"
7 "os"
8
9 "github.com/xnacly/libjson"
10)
11
12func main() {
13 lj := flag.Bool("libjson", true, "benchmark libjson or gojson")
14 flag.Parse()
15 args := flag.Args()
16 if len(args) == 0 {
17 log.Fatalln("Wanted a file as first argument, got nothing, exiting")
18 }
19 file, err := os.Open(args[0])
20 if err != nil {
21 log.Fatalln(err)
22 }
23 if *lj {
24 _, err := libjson.NewReader(file)
25 if err != nil {
26 log.Fatalln(err)
27 }
28 } else {
29 v := []struct {
30 Key1 string
31 Array []any
32 Obj any
33 AtomArray []any
34 }{}
35 d := json.NewDecoder(file)
36 err := d.Decode(&v)
37 if err != nil {
38 log.Fatalln(err)
39 }
40 }
41}
This program takes a flag (-libjson
) to control whether it should use
libjson
or encoding/json
as a backend.
Calling hyperfine
Putting the above together, i came up with the following shell script:
1#!/bin/bash
2echo "generating example data"
3python3 gen.py
4
5echo "building executable"
6rm ./test
7go build ./test.go
8
9hyperfine "./test ./1MB.json" "./test -libjson=false ./1MB.json"
10hyperfine "./test ./5MB.json" "./test -libjson=false ./5MB.json"
11hyperfine "./test ./10MB.json" "./test -libjson=false ./10MB.json"
Preciser benchmarks
The benchmarking via a whole program serves as a good comparison between
different implementations and their total speeds, however it is very imprecise
regarding the allocations and time in the functions we are interested in. To
create a more precise and relevant benchmark, I will also include the results
of the following benchmarks which are using the testing.B
tooling:
1package libjson
2
3import (
4 "encoding/json"
5 "strings"
6 "testing"
7
8 "github.com/stretchr/testify/assert"
9)
10
11const amount = 50_000
12
13func BenchmarkLibJson(b *testing.B) {
14 data := strings.Repeat(`{"key1": "value","array": [],"obj": {},"atomArray": [11201,1e112,true,false,null,"str"]},`, amount)
15 d := []byte("[" + data[:len(data)-1] + "]")
16 b.ResetTimer()
17 for i := 0; i < b.N; i++ {
18 _, err := New(d)
19 assert.NoError(b, err)
20 }
21 b.ReportAllocs()
22}
23
24func BenchmarkEncodingJson(b *testing.B) {
25 data := strings.Repeat(`{"key1": "value","array": [],"obj": {},"atomArray": [11201,1e112,true,false,null,"str"]},`, amount)
26 d := []byte("[" + data[:len(data)-1] + "]")
27 b.ResetTimer()
28 for i := 0; i < b.N; i++ {
29 v := []struct {
30 Key1 string
31 Array []any
32 Obj any
33 AtomArray []any
34 }{}
35 err := json.Unmarshal(d, &v)
36 assert.NoError(b, err)
37 }
38 b.ReportAllocs()
39}
Taking both benchmarks into account, the total baseline is as follows:
1$ cd test && ./bench.sh
2generating example data
3building executable
4Benchmark 1: ./test ./1MB.json
5 Time (mean ± σ): 12.4 ms ± 0.6 ms [User: 16.0 ms, System: 4.8 ms]
6 Range (min … max): 11.1 ms … 14.3 ms 217 runs
7
8Benchmark 2: ./test -libjson=false ./1MB.json
9 Time (mean ± σ): 24.9 ms ± 0.6 ms [User: 25.2 ms, System: 3.5 ms]
10 Range (min … max): 23.7 ms … 26.4 ms 117 runs
11
12Summary
13 ./test ./1MB.json ran
14 2.01 ± 0.11 times faster than ./test -libjson=false ./1MB.json
15Benchmark 1: ./test ./5MB.json
16 Time (mean ± σ): 51.8 ms ± 1.5 ms [User: 77.5 ms, System: 12.4 ms]
17 Range (min … max): 49.4 ms … 54.8 ms 57 runs
18
19Benchmark 2: ./test -libjson=false ./5MB.json
20 Time (mean ± σ): 116.7 ms ± 1.2 ms [User: 133.1 ms, System: 9.8 ms]
21 Range (min … max): 114.1 ms … 119.6 ms 25 runs
22
23Summary
24 ./test ./5MB.json ran
25 2.25 ± 0.07 times faster than ./test -libjson=false ./5MB.json
26Benchmark 1: ./test ./10MB.json
27 Time (mean ± σ): 97.9 ms ± 1.2 ms [User: 142.4 ms, System: 22.7 ms]
28 Range (min … max): 95.0 ms … 100.4 ms 29 runs
29
30Benchmark 2: ./test -libjson=false ./10MB.json
31 Time (mean ± σ): 228.8 ms ± 1.7 ms [User: 257.9 ms, System: 17.7 ms]
32 Range (min … max): 226.2 ms … 231.4 ms 13 runs
33
34Summary
35 ./test ./10MB.json ran
36 2.34 ± 0.03 times faster than ./test -libjson=false ./10MB.json
37
38
39$ go test ./... -bench=.
40goos: linux
41goarch: amd64
42pkg: github.com/xnacly/libjson
43cpu: AMD Ryzen 7 3700X 8-Core Processor
44BenchmarkLibJson-16 21 53156910 ns/op 34744407 B/op 500024 allocs/op
45BenchmarkEncodingJson-16 8 134535103 ns/op 39723750 B/op 650033 allocs/op
46PASS
47ok github.com/xnacly/libjson 2.404s
48? github.com/xnacly/libjson/cmd [no test files]
49? github.com/xnacly/libjson/test [no test files]
Or as tables:
backend | 1mb | 5mb | 10mb |
---|---|---|---|
encoding/json | 24.9ms | 116.7ms | 225.9ms |
libjson | 12.4ms (-12.5ms) | 51.8ms (-64.9ms) | 97.9ms (-128ms) |
backend | ns/op | B/op | allocs/op |
---|---|---|---|
encoding/json | 134_535_103 | 39_723_750 | 650_033 |
libjson | 53_156_910 (-8_137_819) | 34_744_407 (-4_979_343) | 500_024 (-150_009) |
Previous improvements
Since I made most of these before thinking about this blog article, i will just mention them and their relative impact (-ms for improvement, +ms for regression) in a tuple of the input sizes:
change | impact 1mb | impact 5mb | impact 10mb |
---|---|---|---|
disabled unique key checking for objects, duplicates just overwrite the previous value | not tested | not tested | not tested |
lex on demand, instead of before starting the parser | -4ms | -14ms | not tested |
use *(*string)(unsafe.Pointer(&l.buf)) instead of string() to skip an allocation | -0.4ms | -2ms | -4ms |
move allocations and string/number parsing to the parser | -1ms | -2ms | -4ms |
fast paths for true , false and null & byte slices instead of strings for string/number token | -6ms | -25ms | -60ms |
= | -11.4ms | -43ms | -68ms |
Replacing byte slices with offsets and lengths
Currently a token
-structure is defined as:
1type token struct {
2 Type t_json
3 Val []byte
4}
While t_json
defines all possible types of tokens a json document can
contain, and token.Val
is only filed for t_json::t_number
and
t_json::t_string
:
1const (
2 t_string t_json = iota // anything between ""
3 t_number // floating point, hex, etc
4 t_true // true
5 t_false // false
6 t_null // null
7 t_left_curly // {
8 t_right_curly // }
9 t_left_braket // [
10 t_right_braket // ]
11 t_comma // ,
12 t_colon // :
13 t_eof // for any non structure characters outside of strings and numbers
14)
Replacing token.Val
with token.Start
and token.End
does not exactly
remove an allocation, see Go Slices: usage and
internals, but reduces the size the token
struct occupies in memory. The new token
structure:
1type token struct {
2 Type t_json
3 // only populated for number and string
4 Start int
5 End int
6}
Switching the lexer and the parser from byte slices to this wasn’t that hard and results in the following runtimes:
backend | 1mb | 5mb | 10mb |
---|---|---|---|
encoding/json | 24.9ms | 116.7ms | 225.9ms |
libjson before | 12.4ms (-12.5ms) | 51.8ms (-64.9ms) | 97.9ms (-128ms) |
libjson | 12ms (-0.4ms) | 49.8ms (-2ms) | 93.8ms (-4.1ms) |
And the following Benchmark*
output:
backend | ns/op | B/op | allocs/op |
---|---|---|---|
encoding/json | 134_535_103 | 39_723_750 | 650_033 |
libjson before | 53_156_910 (-81_378_193) | 34_744_407 (-4_979_343) | 500_024 (-150_009) |
libjson | 50_504_689 (-2_652_221) | 34_744_392 (-15) | 500_024 |
Manually inlining parser::expect
Currently the parser uses the parser::expect
method to throw an error if the
current token.Type
does not match the expected t_json
:
1func (p *parser) expect(t t_json) error {
2 if p.cur_tok.Type != t {
3 return fmt.Errorf("Unexpected %q at this position, expected %q", tokennames[p.cur_tok.Type], tokennames[t])
4 }
5 return p.advance()
6}
It is, for example, used in parser::array
:
1func (p *parser) array() ([]any, error) {
2 err := p.expect(t_left_braket)
3 if err != nil {
4 return nil, err
5 }
6
7 if p.cur_tok.Type == t_right_braket {
8 err = p.advance()
9 return []any{}, err
10 }
11
12 a := make([]any, 0, 8)
13
14 for p.cur_tok.Type != t_eof && p.cur_tok.Type != t_right_braket {
15 if len(a) > 0 {
16 err := p.expect(t_comma)
17 if err != nil {
18 return nil, err
19 }
20 }
21 node, err := p.expression()
22 if err != nil {
23 return nil, err
24 }
25 a = append(a, node)
26 }
27
28 return a, p.expect(t_right_braket)
29}
This change replaces every parser::expect
invocation with the body of the function:
1func (p *parser) array() ([]any, error) {
2 if p.cur_tok.Type != t_left_braket {
3 return nil, fmt.Errorf("Unexpected %q at this position, expected %q", tokennames[p.cur_tok.Type], tokennames[t_left_braket])
4 }
5 err := p.advance()
6 if err != nil {
7 return nil, err
8 }
9
10 if p.cur_tok.Type == t_right_braket {
11 return []any{}, p.advance()
12 }
13
14 a := make([]any, 0, 8)
15
16 for p.cur_tok.Type != t_eof && p.cur_tok.Type != t_right_braket {
17 if len(a) > 0 {
18 if p.cur_tok.Type != t_comma {
19 return nil, fmt.Errorf("Unexpected %q at this position, expected %q", tokennames[p.cur_tok.Type], tokennames[t_comma])
20 }
21 err := p.advance()
22 if err != nil {
23 return nil, err
24 }
25 }
26 node, err := p.expression()
27 if err != nil {
28 return nil, err
29 }
30 a = append(a, node)
31 }
32
33 if p.cur_tok.Type != t_right_braket {
34 return nil, fmt.Errorf("Unexpected %q at this position, expected %q", tokennames[p.cur_tok.Type], tokennames[t_right_braket])
35 }
36
37 return a, p.advance()
38}
The results of this change:
backend | 1mb | 5mb | 10mb |
---|---|---|---|
encoding/json | 24.9ms | 116.7ms | 225.9ms |
libjson before | 12ms (-12.9ms) | 49.8ms (-66.9ms) | 93.8ms (-134.1ms) |
libjson | 11.5ms (-0.5ms) | 48.5ms (-1.3ms) | 91ms (-2.3ms) |
backend | ns/op | B/op | allocs/op |
---|---|---|---|
encoding/json | 134_535_103 | 39_723_750 | 650_033 |
libjson before | 50_504_689 (-84_030_414) | 34_744_392 (-4_979_358) | 500_024 (-150_009) |
libjson | 49_262_503 (-1_242_186) | 34_744_160 (-232) | 500_024 |
Parallelising lexer::next and parser::parse
Moving from lexing on demand to possibly prelexing in paralell did not
result a decrease in runtime, but regressed the performance towards the
encoding/json
results.
I tested buffered channel sizes from 2,8,16,32 up until 50k (because thats the number i use for the amount of objects for the benchmarks) and none of these tests yielded any runtime improvements.
Currently said on demand lexing is attached to the parser as follows:
1type parser struct {
2 l lexer
3 cur_tok token
4}
5
6func (p *parser) advance() error {
7 t, err := p.l.next()
8 p.cur_tok = t
9 if p.cur_tok.Type == t_eof && err != nil {
10 return err
11 }
12 return nil
13}
Replacing this with a channel and moving the tokenisation to a go routine:
1type parser struct {
2 l lexer
3 c <-chan token
4 cur_tok token
5}
6
7func (p *parser) advance() {
8 p.cur_tok = <-p.c
9}
Previously, the lexer and parser invocation worked as shown below:
1func New(data []byte) (JSON, error) {
2 p := parser{l: lexer{data: data}}
3 obj, err := p.parse()
4 if err != nil {
5 return JSON{}, err
6 }
7 return JSON{obj}, nil
8}
Now we have to manage error handling, channel creation, etc:
1const chanSize = 64
2
3func New(data []byte) (JSON, error) {
4 l := lexer{data: data}
5 c := make(chan token, chanSize)
6 var lexerErr error
7 go func() {
8 for {
9 if tok, err := l.next(); err == nil {
10 if tok.Type == t_eof {
11 break
12 }
13 c <- tok
14 } else {
15 lexerErr = err
16 break
17 }
18 }
19 close(c)
20 c = nil
21 }()
22 p := parser{l: l, c: c}
23 obj, err := p.parse()
24 if lexerErr != nil {
25 return JSON{}, lexerErr
26 }
27 if err != nil {
28 return JSON{}, err
29 }
30 return JSON{obj}, nil
31}
The changes are kept in the
paralell-lexer-and-parser
branch.
Benchmarking this sadly did not satisfy my search for better performance, but I wanted to include it either way:
backend | 1mb | 5mb | 10mb |
---|---|---|---|
encoding/json | 24.9ms | 116.7ms | 225.9ms |
libjson before | 11.5ms (-13.4ms) | 48.5ms (-68.2ms) | 91ms (-136.4ms) |
libjson | 24.3ms (+12.8ms) | 112.5ms (+64ms) | 218.6ms (+127.6ms) |
backend | ns/op | B/op | allocs/op |
---|---|---|---|
encoding/json | 134_535_103 | 39_723_750 | 650_033 |
libjson before | 50_504_689 (-85_272_600) | 34_744_392 (-4_979_590) | 500_024 (-150_009) |
libjson | 125_533_704 (+75_029_015) | 35_949_392 (+1_205_000) | 500_031 (+7) |
Results
Again, these result from design choices, benchmarking, profiling and exploration of many things similar to the parallelisation attempt above.
backend | 1mb | 5mb | 10mb |
---|---|---|---|
encoding/json | 24.9ms | 116.7ms | 225.9ms |
libjson before | 11.5ms (-13.4ms) | 48.5ms (-68.2ms) | 91ms (-136.4ms) |
backend | ns/op | B/op | allocs/op |
---|---|---|---|
encoding/json | 134_535_103 | 39_723_750 | 650_033 |
libjson | 50_504_689 (-85_272_600) | 34_744_392 (-4_979_590) | 500_024 (-150_009) |
Summing up, libjson is currently around 2.5x faster than encoding/json
. For
50k objects in a json array, it uses 5MB less memory and makes 150k less
allocations - pretty good.