While looking through my repos, an issue in go-iso8601-duration, which I previously missed, popped up, informing me of a missing license in the last release, resulting in missing documentation for the v1.1.0 release (Since the gopkg website doesnt render documentation for non licensed projects).
Since I had to do a new release either way, I took the opportunity and benchmarked the current state.
Benchmarks and a Baseline
I reused the cases previously employed for checking the FSM to establish some micro benchmarks spanning ALL branches the FSM can take. Since I also run the tests normally before benchmarking, I dont have to validate results in the benchmark.
For an iso8601 FSM deepdive and a rant about the ISO org, see my previous article: Handrolling ISO8601 Duration Support for Go
1var testcases = []struct {
2 str string
3 dur Duration
4}{
5 {"P0D", Duration{}},
6 {"PT15H", Duration{hour: 15}},
7 {"P1W", Duration{week: 1}},
8 {"P15W", Duration{week: 15}},
9 {"P1Y15W", Duration{year: 1, week: 15}},
10 {"P15Y", Duration{year: 15}},
11 {"P15Y3M", Duration{year: 15, month: 3}},
12 {"P15Y3M41D", Duration{year: 15, month: 3, day: 41}},
13 {"PT15M", Duration{minute: 15}},
14 {"PT15M10S", Duration{minute: 15, second: 10}},
15 {
16 "P3Y6M4DT12H30M5S",
17 Duration{
18 year: 3,
19 month: 6,
20 day: 4,
21 hour: 12,
22 minute: 30,
23 second: 5,
24 },
25 },
26}
27
28func BenchmarkDuration(b *testing.B) {
29 for _, i := range testcases {
30 b.Run(i.str, func(b *testing.B) {
31 for n := 0; n < b.N; n++ {
32 _, _ = From(i.str)
33 }
34 })
35 }
36}Benchmarks use Go’s testing package on:
TEXT1goos: linux 2goarch: amd64 3pkg: github.com/xnacly/go-iso8601-duration 4cpu: AMD Ryzen 7 3700X 8-Core ProcessorExecuted with
go test ./... -bench=. -benchmem
1goos: linux
2goarch: amd64
3pkg: github.com/xnacly/go-iso8601-duration
4cpu: AMD Ryzen 7 3700X 8-Core Processor
5BenchmarkDuration/P0D-16 24741985 48.10 ns/op 8 B/op 1 allocs/op
6BenchmarkDuration/PT15H-16 18533790 63.26 ns/op 8 B/op 1 allocs/op
7BenchmarkDuration/P1W-16 24164163 48.64 ns/op 8 B/op 1 allocs/op
8BenchmarkDuration/P15W-16 20793126 58.09 ns/op 8 B/op 1 allocs/op
9BenchmarkDuration/P1Y15W-16 13904265 84.21 ns/op 8 B/op 1 allocs/op
10BenchmarkDuration/P15Y-16 19943671 59.35 ns/op 8 B/op 1 allocs/op
11BenchmarkDuration/P15Y3M-16 14227842 83.24 ns/op 8 B/op 1 allocs/op
12BenchmarkDuration/P15Y3M41D-16 10262575 115.6 ns/op 8 B/op 1 allocs/op
13BenchmarkDuration/PT15M-16 18228138 63.90 ns/op 8 B/op 1 allocs/op
14BenchmarkDuration/PT15M10S-16 12289743 95.84 ns/op 8 B/op 1 allocs/op
15BenchmarkDuration/P3Y6M4DT12H30M5S-16 6758479 177.3 ns/op 8 B/op 1 allocs/op
16PASS
17ok github.com/xnacly/go-iso8601-duration 13.952sWarning - Note
At the end of each section I will show the updated output of the benchsmarks with an average improvement in percent, this will be relative to the previous change, not to the original results.Replacing the numBuffer with a running int64
The first weird detail I noticed, was previous me using bytes.Buffer to store
the single characters making up the numbers for parsing the numeric value of
each designator, for instance 593 in P593W, when a running int64 would
suffice.
1func From(s string) (Duration, error) {
2 var duration Duration
3 curState := stateStart
4 var col uint8
5 numBuf := *bytes.NewBuffer(make([]byte, 0, 8))
6
7 for {
8 // ...
9 switch curState {
10 // ...
11 case stateTNumber:
12 if unicode.IsDigit(b) {
13 numBuf.WriteRune(b)
14 curState = stateTNumber
15 } else if strings.ContainsRune(timeDesignators, b) {
16 if numBuf.Len() == 0 {
17 return duration, wrapErr(MissingNumber, col)
18 }
19 num, err := numBufferToNumber(numBuf)
20 if err != nil {
21 return duration, err
22 }
23 // using the number
24 numBuf.Reset()
25 curState = stateTDesignator
26 } else {
27 return duration, wrapErr(UnknownDesignator, col)
28 }
29 }
30 }
31}Especially since the bytes.Buffer interactions require multiple function
calls for writing, a call for reading and allocations for the underlying
buffer, while the final number parsing itself in numBufferToNum forming a
second pass of the integer characters.
1func numBufferToNumber(buf bytes.Buffer) (int64, error) {
2 var i int64
3 for _, n := range buf.Bytes() {
4 digit := int64(n - '0')
5 if i > (math.MaxInt64-digit)/10 {
6 return 0, DesignatorNumberTooLarge
7 }
8 i = (i * 10) + digit
9 }
10
11 return i, nil
12}For this problem, all of this is totally unnecessary and can
be fused into a running int64 temporary value, updating
value on seeing numeric bytes and then using the integer
when creating the designators field of the
goiso8601duration.Duration struct:
1diff --git a/duration.go b/duration.go
2index 80a3b4c..4506457 100644
3--- a/duration.go
4+++ b/duration.go
5@@ -128,7 +128,8 @@ func From(s string) (Duration, error) {
6
7 curState := stateStart
8 var col uint8
9- numBuf := *bytes.NewBuffer(make([]byte, 0, 8))
10+ var num int64
11+ var hasNum bool
12 r := strings.NewReader(s)
13
14 for {
15@@ -171,23 +172,25 @@ func From(s string) (Duration, error) {
16 if b == 'T' {
17 curState = stateT
18 } else if unicode.IsDigit(b) {
19- numBuf.WriteRune(b)
20+ num = (num * 10) + int64(b-'0')
21+ hasNum = true
22 curState = stateNumber
23 } else {
24 return duration, wrapErr(MissingNumber, col)
25 }
26 case stateNumber:
27 if unicode.IsDigit(b) {
28- numBuf.WriteRune(b)
29+ digit := int64(b - '0')
30+ if num > (math.MaxInt64-digit)/10 {
31+ return duration, DesignatorNumberTooLarge
32+ }
33+ num = (num * 10) + digit
34+ hasNum = true
35 curState = stateNumber
36 } else if strings.ContainsRune(defaultDesignators, b) {
37- if numBuf.Len() == 0 {
38+ if !hasNum {
39 return duration, wrapErr(MissingNumber, col)
40 }
41- num, err := numBufferToNumber(numBuf)
42- if err != nil {
43- return duration, err
44- }
45 switch b {
46 case 'Y':
47 if duration.year != 0 {
48@@ -210,30 +213,36 @@ func From(s string) (Duration, error) {
49 }
50 duration.day = num
51 }
52- numBuf.Reset()
53+ num = 0
54 curState = stateDesignator
55 } else {
56 return duration, wrapErr(UnknownDesignator, col)
57 }
58 case stateT, stateTDesignator:
59 if unicode.IsDigit(b) {
60- numBuf.WriteRune(b)
61+ digit := int64(b - '0')
62+ if num > (math.MaxInt64-digit)/10 {
63+ return duration, DesignatorNumberTooLarge
64+ }
65+ num = (num * 10) + digit
66+ hasNum = true
67 curState = stateTNumber
68 } else {
69 return duration, wrapErr(MissingNumber, col)
70 }
71 case stateTNumber:
72 if unicode.IsDigit(b) {
73- numBuf.WriteRune(b)
74+ digit := int64(b - '0')
75+ if num > (math.MaxInt64-digit)/10 {
76+ return duration, DesignatorNumberTooLarge
77+ }
78+ num = (num * 10) + digit
79+ hasNum = true
80 curState = stateTNumber
81 } else if strings.ContainsRune(timeDesignators, b) {
82- if numBuf.Len() == 0 {
83+ if !hasNum {
84 return duration, wrapErr(MissingNumber, col)
85 }
86- num, err := numBufferToNumber(numBuf)
87- if err != nil {
88- return duration, err
89- }
90 switch b {
91 case 'H':
92 if duration.hour != 0 {
93@@ -251,7 +260,7 @@ func From(s string) (Duration, error) {
94 }
95 duration.second = num
96 }
97- numBuf.Reset()
98+ num = 0
99 curState = stateTDesignator
100 } else {
101 return duration, wrapErr(UnknownDesignator, col)
The above diff shows:
- Parsing number via
numwhile iterating (no more second pass numeric parsing) - Overflow checks, analog to the previous version, returing
DesignatorNumberTooLargeon occurrence - No more calls to
numBuf.{WriteRune,Reset,Bytes}in the hotpath, or anywhere - No allocations anymore
Benchmarks show an average runtime reduction of ~45%:
1goos: linux
2goarch: amd64
3pkg: github.com/xnacly/go-iso8601-duration
4cpu: AMD Ryzen 7 3700X 8-Core Processor
5BenchmarkDuration/P0D-16 45560650 25.55 ns/op 0 B/op 0 allocs/op
6BenchmarkDuration/PT15H-16 34413621 35.08 ns/op 0 B/op 0 allocs/op
7BenchmarkDuration/P1W-16 46468417 25.84 ns/op 0 B/op 0 allocs/op
8BenchmarkDuration/P15W-16 37708790 31.77 ns/op 0 B/op 0 allocs/op
9BenchmarkDuration/P1Y15W-16 26145358 45.38 ns/op 0 B/op 0 allocs/op
10BenchmarkDuration/P15Y-16 37434289 31.84 ns/op 0 B/op 0 allocs/op
11BenchmarkDuration/P15Y3M-16 26251488 45.27 ns/op 0 B/op 0 allocs/op
12BenchmarkDuration/P15Y3M41D-16 18727663 63.59 ns/op 0 B/op 0 allocs/op
13BenchmarkDuration/PT15M-16 33159258 35.16 ns/op 0 B/op 0 allocs/op
14BenchmarkDuration/PT15M10S-16 21954288 53.80 ns/op 0 B/op 0 allocs/op
15BenchmarkDuration/P3Y6M4DT12H30M5S-16 10706078 111.9 ns/op 0 B/op 0 allocs/op
16PASS
17ok github.com/xnacly/go-iso8601-duration 13.605sThrowing unicode.IsDigit away for ‘0’ <= b && b <= ‘9’
The least impactful change is inlining unicode.IsDigit,
but only the ascii (<= MaxLatin1) portion, since the filter
out the non-ascii inputs.
1package unicode
2
3// IsDigit reports whether the rune is a decimal digit.
4func IsDigit(r rune) bool {
5 if r <= MaxLatin1 {
6 return '0' <= r && r <= '9'
7 }
8 return isExcludingLatin(Digit, r)
9}This change results in ~6% less runtime:
1goos: linux
2goarch: amd64
3pkg: github.com/xnacly/go-iso8601-duration
4cpu: AMD Ryzen 7 3700X 8-Core Processor
5BenchmarkDuration/P0D-16 47473572 24.55 ns/op 0 B/op 0 allocs/op
6BenchmarkDuration/PT15H-16 35477791 32.96 ns/op 0 B/op 0 allocs/op
7BenchmarkDuration/P1W-16 48368948 24.96 ns/op 0 B/op 0 allocs/op
8BenchmarkDuration/P15W-16 40997600 29.37 ns/op 0 B/op 0 allocs/op
9BenchmarkDuration/P1Y15W-16 28936015 41.42 ns/op 0 B/op 0 allocs/op
10BenchmarkDuration/P15Y-16 39383282 30.27 ns/op 0 B/op 0 allocs/op
11BenchmarkDuration/P15Y3M-16 27719198 42.48 ns/op 0 B/op 0 allocs/op
12BenchmarkDuration/P15Y3M41D-16 20594792 58.53 ns/op 0 B/op 0 allocs/op
13BenchmarkDuration/PT15M-16 34655841 33.56 ns/op 0 B/op 0 allocs/op
14BenchmarkDuration/PT15M10S-16 24108157 50.88 ns/op 0 B/op 0 allocs/op
15BenchmarkDuration/P3Y6M4DT12H30M5S-16 11649555 102.50 ns/op 0 B/op 0 allocs/op
16PASS
17ok github.com/xnacly/go-iso8601-duration 13.601sReplacing strings.Reader with for-range rune iteration
1go test ./... -bench=. -cpuprofile=cpu
2go tool pprof -http :8080 cpu
While the next section targets strings.ContainsRune, this section focuses on
strings.(*Reader).ReadRune. Specifically requesting a new rune.
1for {
2 b, size, err := r.ReadRune()
3 if err != nil {
4 if err != io.EOF {
5 return duration, wrapErr(UnexpectedReaderError, col)
6 } else {
7 // other error handling here
8 }
9 }
10 if size > 1 {
11 return duration, wrapErr(UnexpectedNonAsciiRune, col)
12 }
13 col++
14
15 switch curState {
16 // ...
17 }
18}Each iteration requires a ReadRune call, multiple branches
for error handling and ascii checking, since i want to error
on non ascii inputs.
This is simply not necessary, since I can iterate over the
runes of the input string using a for-range loop, check for
ascii validity with r <= unicode.MaxASCII:
1diff --git a/duration.go b/duration.go
2index e4083bb..52eec76 100644
3--- a/duration.go
4+++ b/duration.go
5@@ -2,11 +2,11 @@ package goiso8601duration
6
7 import (
8 "encoding/json"
9- "io"
10 "math"
11 "strconv"
12 "strings"
13 "time"
14+ "unicode"
15 )
16
17 // constants for roundtripping between time.Duration and Duration
18@@ -112,33 +112,13 @@ func From(s string) (Duration, error) {
19 }
20
21 curState := stateStart
22- var col uint8
23 var num int64
24 var hasNum bool
25- r := strings.NewReader(s)
26-
27- for {
28- b, size, err := r.ReadRune()
29- if err != nil {
30- if err != io.EOF {
31- return duration, wrapErr(UnexpectedReaderError, col)
32- } else if curState == stateP {
33- // being in stateP at the end (io.EOF) means we havent
34- // encountered anything after the P, so there were no numbers
35- // or states
36- return duration, wrapErr(UnexpectedEof, col)
37- } else if curState == stateNumber || curState == stateTNumber {
38- // if we are in the state of Number or TNumber we had a number
39- // but no designator at the end
40- return duration, wrapErr(MissingDesignator, col)
41- } else {
42- curState = stateFin
43- }
44- }
45- if size > 1 {
46+
47+ for col, b := range s {
48+ if b > unicode.MaxASCII {
49 return duration, wrapErr(UnexpectedNonAsciiRune, col)
50 }
51- col++
52
53 switch curState {
54 case stateStart:
55@@ -254,6 +234,19 @@ func From(s string) (Duration, error) {
56 return duration, nil
57 }
58 }
59+
60+ if curState == stateP {
61+ // being in stateP at the end (io.EOF) means we havent
62+ // encountered anything after the P, so there were no numbers
63+ // or states
64+ return duration, wrapErr(UnexpectedEof, len(s))
65+ } else if curState == stateNumber || curState == stateTNumber {
66+ // if we are in the state of Number or TNumber we had a number
67+ // but no designator at the end
68+ return duration, wrapErr(MissingDesignator, len(s))
69+ } else {
70+ return duration, nil
71+ }
72 }
73
74 func (i Duration) Apply(t time.Time) time.Time {
Thus reducing the hot path complexity by multiple function calls, branches and
the bullshit strings.(*Reader).ReadRune does:
1package strings
2
3// ReadRune implements the [io.RuneReader] interface.
4func (r *Reader) ReadRune() (ch rune, size int, err error) {
5 if r.i >= int64(len(r.s)) {
6 r.prevRune = -1
7 return 0, 0, io.EOF
8 }
9 r.prevRune = int(r.i)
10 ch, size = utf8.DecodeRuneInString(r.s[r.i:])
11 r.i += int64(size)
12 return
13}These changes result in ~38% runtime reduction:
1goos: linux
2goarch: amd64
3pkg: github.com/xnacly/go-iso8601-duration
4cpu: AMD Ryzen 7 3700X 8-Core Processor
5BenchmarkDuration/P0D-16 71080273 16.55 ns/op 0 B/op 0 allocs/op
6BenchmarkDuration/PT15H-16 59493799 19.94 ns/op 0 B/op 0 allocs/op
7BenchmarkDuration/P1W-16 70588683 16.73 ns/op 0 B/op 0 allocs/op
8BenchmarkDuration/P15W-16 62168469 18.98 ns/op 0 B/op 0 allocs/op
9BenchmarkDuration/P1Y15W-16 43747168 28.85 ns/op 0 B/op 0 allocs/op
10BenchmarkDuration/P15Y-16 63284880 19.99 ns/op 0 B/op 0 allocs/op
11BenchmarkDuration/P15Y3M-16 38357061 29.42 ns/op 0 B/op 0 allocs/op
12BenchmarkDuration/P15Y3M41D-16 28748721 41.85 ns/op 0 B/op 0 allocs/op
13BenchmarkDuration/PT15M-16 53639156 20.78 ns/op 0 B/op 0 allocs/op
14BenchmarkDuration/PT15M10S-16 37592148 31.97 ns/op 0 B/op 0 allocs/op
15BenchmarkDuration/P3Y6M4DT12H30M5S-16 16793139 71.17 ns/op 0 B/op 0 allocs/op
16PASS
17ok github.com/xnacly/go-iso8601-duration 13.434sRemoving duplicate Branching in Designator validitiy checks
Lets look at the flamegraph again:

The offender strings.ContainsRune. The previous implementation employed a
duplicate check for making sure non numeric chars match the expected characters
at a given FSM state.
1// In representations of duration,
2// the following designators are used as part of the expression,
3// see the doc comment of the function
4//
5// [Y] [M] [W] [D] [H] [M] [S]
6const (
7 defaultDesignators = "YMWD"
8 timeDesignators = "MHS"
9)For instance in stateTNumber, there is a branch for a numeric character, and
a conditional call to strings.ContainsRune before switching on each
designator in timeDesignators again:
1case stateTNumber:
2 if '0' <= b && b <= '9' {
3 digit := int64(b - '0')
4 if num > (math.MaxInt64-digit)/10 {
5 return duration, DesignatorNumberTooLarge
6 }
7 num = (num * 10) + digit
8 hasNum = true
9 curState = stateTNumber
10 } else if strings.ContainsRune(timeDesignators, b) {
11 if !hasNum {
12 return duration, wrapErr(MissingNumber, col)
13 }
14 switch b {
15 case 'H':
16 if duration.hour != 0 {
17 return duration, wrapErr(DuplicateDesignator, col)
18 }
19 duration.hour = num
20 case 'M':
21 if duration.minute != 0 {
22 return duration, wrapErr(DuplicateDesignator, col)
23 }
24 duration.minute = num
25 case 'S':
26 if duration.second != 0 {
27 return duration, wrapErr(DuplicateDesignator, col)
28 }
29 duration.second = num
30 }
31 num = 0
32 curState = stateTDesignator
33 } else {
34 return duration, wrapErr(UnknownDesignator, col)
35 }The same approach is used for stateNumber, only with defaultDesignators
instead of timeDesignators.
While strings.ContainsRune is fairly optimised by having a assembly fast path for utf8.RuneSelf (characters below RuneSelf are represented as themselves in a single byte), it still has to iterate the haystack for the needle we want to check for. This is fully unnecessary for duration parsing, since the haystack is known at compile time and can therefore be represented as a switch block instead, which we already do, just after checking for validity.
Therefore these paths can be merged into a single one, moving the error on invalid designator from the condition to the default case of the switch block:
1@@ -204,7 +195,7 @@ func From(s string) (Duration, error) {
2 num = (num * 10) + digit
3 hasNum = true
4 curState = stateTNumber
5- } else if strings.ContainsRune(timeDesignators, b) {
6+ } else {
7 if !hasNum {
8 return duration, wrapErr(MissingNumber, col)
9 }
10@@ -224,11 +215,12 @@ func From(s string) (Duration, error) {
11 return duration, wrapErr(DuplicateDesignator, col)
12 }
13 duration.second = num
14+ default:
15+ return duration, wrapErr(UnknownDesignator, col)
16 }
17 num = 0
18 curState = stateTDesignator
19- } else {
20- return duration, wrapErr(UnknownDesignator, col)
21 }
This change reduces the runtime by 28%:
1goos: linux
2goarch: amd64
3pkg: github.com/xnacly/go-iso8601-duration
4cpu: AMD Ryzen 7 3700X 8-Core Processor
5BenchmarkDuration/P0D-16 100000000 10.51 ns/op 0 B/op 0 allocs/op
6BenchmarkDuration/PT15H-16 80446293 14.58 ns/op 0 B/op 0 allocs/op
7BenchmarkDuration/P1W-16 100000000 10.33 ns/op 0 B/op 0 allocs/op
8BenchmarkDuration/P15W-16 93517448 12.75 ns/op 0 B/op 0 allocs/op
9BenchmarkDuration/P1Y15W-16 70472916 17.19 ns/op 0 B/op 0 allocs/op
10BenchmarkDuration/P15Y-16 89461000 12.75 ns/op 0 B/op 0 allocs/op
11BenchmarkDuration/P15Y3M-16 61222558 18.65 ns/op 0 B/op 0 allocs/op
12BenchmarkDuration/P15Y3M41D-16 46391493 25.45 ns/op 0 B/op 0 allocs/op
13BenchmarkDuration/PT15M-16 80630947 14.98 ns/op 0 B/op 0 allocs/op
14BenchmarkDuration/PT15M10S-16 55008163 22.14 ns/op 0 B/op 0 allocs/op
15BenchmarkDuration/P3Y6M4DT12H30M5S-16 24965916 46.99 ns/op 0 B/op 0 allocs/op
16PASS
17ok github.com/xnacly/go-iso8601-duration 12.956sReplace rune based for-range with []byte
Since this isnt a function call, we cant view its affect in the flamegraph
directly, but we can hone in on go-iso8601-duration.From in the graph and
then switch to the source-view, showing us a total time taken in the rune
loop and the MaxASCII check:
Before
1108 1.65s 1.65s for col, b := range s {
2109 320ms 320ms if b > unicode.MaxASCII {Replacing this with a []byte cast and thus omitting the MaxASCII check:
1diff --git a/duration.go b/duration.go
2index ba295d7..440e71a 100644
3--- a/duration.go
4+++ b/duration.go
5@@ -6,7 +6,6 @@ import (
6 "strconv"
7 "strings"
8 "time"
9- "unicode"
10 )
11
12 // constants for roundtripping between time.Duration and Duration
13@@ -105,11 +104,7 @@ func From(s string) (Duration, error) {
14 var num int64
15 var hasNum bool
16
17- for col, b := range s {
18- if b > unicode.MaxASCII {
19- return duration, wrapErr(UnexpectedNonAsciiRune, col)
20- }
21-
22+ for col, b := range []byte(s) {
23 switch curState {
24 case stateStart:
25 switch b {
Results in 200ms less time spent in the loop:
1107 1.47s 1.47s for col, b := range []byte(s) {Running the whole suite results in ~8.6% less runtime:
1goos: linux
2goarch: amd64
3pkg: github.com/xnacly/go-iso8601-duration
4cpu: AMD Ryzen 7 3700X 8-Core Processor
5BenchmarkDuration/P0D-16 120725466 9.931 ns/op 0 B/o 0 allocs/op
6BenchmarkDuration/PT15H-16 92520681 11.67 ns/op 0 B/o 0 allocs/op
7BenchmarkDuration/P1W-16 100000000 10.15 ns/op 0 B/o 0 allocs/op
8BenchmarkDuration/P15W-16 100000000 10.26 ns/op 0 B/o 0 allocs/op
9BenchmarkDuration/P1Y15W-16 82810779 14.83 ns/op 0 B/o 0 allocs/op
10BenchmarkDuration/P15Y-16 120902158 10.29 ns/op 0 B/o 0 allocs/op
11BenchmarkDuration/P15Y3M-16 75393909 15.68 ns/op 0 B/o 0 allocs/op
12BenchmarkDuration/P15Y3M41D-16 56214748 22.31 ns/op 0 B/o 0 allocs/op
13BenchmarkDuration/PT15M-16 91787005 12.85 ns/op 0 B/o 0 allocs/op
14BenchmarkDuration/PT15M10S-16 53683030 19.45 ns/op 0 B/o 0 allocs/op
15BenchmarkDuration/P3Y6M4DT12H30M5S-16 28636116 42.47 ns/op 0 B/op 0 allocs/op
16PASS
17ok github.com/xnacly/go-iso8601-duration 14.860sThis is easily explainable when looking at the assembly output. While for col, r := range s compiles to:
for col, r := range []byte(s) compiles to a third of the instructions:
Most noticeably, the latter lacks the call to runtime.decoderune for each
iteration:
1 PCDATA $1, $0
2 CALL runtime.decoderune(SB)So this change not only shaves off 2/3 of the instructions, but it also removes the per iteration utf8 decoding, bounds checks and multi-byte branching. This is possible because the FSM only needs to operate on ASCII inputs (ISO8601 duration is defined on an ASCII subset), allowing the loop to treat the string as raw bytes rather than decoding runes.
Hoisting overflow checks from each numeric change to field creation
Previously the FSM validates the running int64 value wouldnt exceed the valid representation on each numeric character occurence:
1case stateT, stateTDesignator:
2 if '0' <= b && b <= '9' {
3 digit := int64(b - '0')
4 if num > (math.MaxInt64-digit)/10 {
5 return duration, DesignatorNumberTooLarge
6 }
7 num = (num * 10) + digit
8 hasNum = true
9 curState = stateTNumber
10 } else {
11 return duration, wrapErr(MissingNumber, col)
12 }To reduce branches in the hot path I decided to test moving this overflow checking to the location where the duration fields are created and the value is actually used, however this requires widening the running int64 to uint64:
1- var num int64
2+ var num uint64
3
4 case stateT, stateTDesignator:
5 if '0' <= b && b <= '9' {
6- digit := int64(b - '0')
7- if num > (math.MaxInt64-digit)/10 {
8- return duration, DesignatorNumberTooLarge
9- }
10- num = (num * 10) + digit
11+ num = (num * 10) + uint64(b-'0')
12 hasNum = true
13 curState = stateTNumber
14 } else {
And finally re-adding the check to the usage site:
1+ if num > math.MaxInt64 {
2+ return duration, DesignatorNumberTooLarge
3+ }
This was done to have less branches in the hot path for numbers, thus has little to no impact on shorter inputs, still a average ~5% reduction, tho:
1goos: linux
2goarch: amd64
3pkg: github.com/xnacly/go-iso8601-duration
4cpu: AMD Ryzen 7 3700X 8-Core Processor
5BenchmarkDuration/P0D-16 100000000 10.01 ns/op
6BenchmarkDuration/PT15H-16 100000000 11.81 ns/op
7BenchmarkDuration/P1W-16 100000000 10.19 ns/op
8BenchmarkDuration/P15W-16 120588932 9.779 ns/op
9BenchmarkDuration/P1Y15W-16 86265106 13.76 ns/op
10BenchmarkDuration/P15Y-16 122859478 9.910 ns/op
11BenchmarkDuration/P15Y3M-16 77137617 14.32 ns/op
12BenchmarkDuration/P15Y3M41D-16 60263616 20.74 ns/op
13BenchmarkDuration/PT15M-16 96869085 12.30 ns/op
14BenchmarkDuration/PT15M10S-16 63139969 18.33 ns/op
15BenchmarkDuration/P3Y6M4DT12H30M5S-16 29586862 39.14 ns/op
16PASS
17ok github.com/xnacly/go-iso8601-duration 14.812sThis change results in a contract change compared to
v.1.1.0, since previously the overflow check fired on each number, now it only fires on exceeding the valid int64 size on usage. Overflowing uint64 is not handled.
Results
The base idea of these improvements is to eliminate abstractions hiding:
- allocations
- loops
- generic logic
When all of these can be omitted by knowing the problem domain / what durations in the iso8601 sense are made up of, specifically:
- only ascii input
- no decimal points
- compile time known, non dynamic designators
- number and designator pairs are parsable in a single pass
When applied result in a final total runtime performance improvement compared to the original results shown in Benchmarks and a Baseline:
- Geometric Speedup Mean: 6.38x
- Range: 4.82x - 7.60x
- Median: 6.49x
| Benchmark | Baseline (ns) | Improved (ns) | Runtime reduction (%) | Speedup (x) | Delta (ns) |
|---|---|---|---|---|---|
| P0D | 64.26 | 10.01 | 84.43 | 6.42x | 54.25 |
| PT15H | 76.69 | 11.81 | 84.60 | 6.49x | 64.88 |
| P1W | 64.10 | 10.19 | 84.11 | 6.29x | 53.91 |
| P15W | 74.35 | 9.78 | 86.80 | 7.60x | 64.57 |
| P1Y15W | 95.39 | 13.76 | 85.56 | 6.93x | 81.63 |
| P15Y | 74.44 | 9.91 | 86.65 | 7.51x | 64.53 |
| P15Y3M | 96.33 | 14.32 | 85.15 | 6.73x | 82.01 |
| P15Y3M41D | 123.70 | 20.74 | 83.25 | 5.96x | 102.96 |
| PT15M | 76.61 | 12.30 | 83.96 | 6.23x | 64.31 |
| PT15M10S | 104.80 | 18.33 | 82.50 | 5.72x | 86.47 |
| P3Y6M4DT12H30M5S | 188.70 | 39.14 | 79.27 | 4.82x | 149.56 |