- Jh123x: Blog, Code, Fun and everything in between./
- My Blog Posts and Stories/
- Project based Learning: BuilderGen/
Project based Learning: BuilderGen
Table of Contents
Introduction #
BuilderGen is a tool that for generating builder patterns based on go struct
s.
In this blog post, I will be going through the different things I have learnt while working on this project.
For each of the iterations, I have a benchmark which acts as a way to see how each optimization has improved the time BuilderGen
to generate the struct
for the same file.
You can view the benchmark here.
Initial Version (v0.0.1) #
The initial commit, this is the first version of BuilderGen
.
Everything is clobbered together to get it working.
There is nothing not much optimization that was taken into mind.
The runtime of the benchmark is 1083661 ns/op
.
This is how the code roughly functions
I/O improvements (v0.0.2) #
After looking at how the program flowed, there is 1 main optimization which I immediately noticed. There were multiple calls to file I/O which were done. This File I/O were usually very slow, thus in the next version I aimed to improve this.
In this version, I aimed to tackle the file I/O issue to reduce the time taken by the program to generate 1 struct file. Instead of writing to the file before the formatting is completed.
With this simple change, the performance improved almost 2x
from the previous 1083661 ns/op
to 536149 ns/op
.
Instead of writing to file during the generation phase, it will output a string that is passed to the formatter. The tradeoff is that we will need to hold the whole file in memory to process it. However, as the file is usually very small, this should be ok in most cases.
Using tmpl #
I also explored using tmpl
instead of just using the string.Builder{}
.
You can look at the template code here
As expected, the template implementation is slower than that of strings.Builder
at 823289 ns/op
instead of 536149 ns/op
.
Pros #
- Easier to maintain the templates
- Some of range loops can be done in the template
Cons #
- Slower
In the end, I’ve decided to keep the strings.Builder
implementation as it as significantly faster.
Attribute Checks (v0.0.3) #
The part of the code here is actually the most interesting. For this part of the code, I’ve decided to improve on the code that parses the names of the field.
This optimization is not really due to the fact that this section is a bottleneck. It is more of a fun problem to consider to see how much I can optimize it.
Before we go into the optimizations, let us go through what we have to optimize.
func IsKeyword(keyword string) bool {
...
}
The IsKeyword
function above is a function that checks if a given string is a golang
keyword.
The list of golang keywords are found in Appendix A.
Here is my journey to make this function go as fast as possible. After going through all the code, I’ll go through the runtime for each of the code for comparison.
Naive implementation #
For the naive implementation, I just loop through all the possible keywords This is the 1st implementation that anyone will think of when starting to code this.
The code for it is something as follows
var KEYWORDS = []string{"go", "func", ...}
func IsKeyword(word string) bool {
for _, keyword := range KEYWORDS {
if word == keyword {
return true
}
}
return false
}
The runtime of this is O(nk) where n is the length of the keyword (25) and k is the length of the keyword
or word
whichever is lower.
I used this as a baseline to compare to the rest of the code below.
Golang Map #
The next step up is a hashmap that can compute the hash in O(1) time and lookup the bucket to compare the result to the word. The code for it is something as follows
var KEYWORDS = map[string]struct{}{
"go": struct{}
...
}
...
func IsKeyword(word string) bool {
_, ok := KEYWORDS[word]
return ok
}
The runtime of this algorithm is O(len(word)+k) where k is the length of the keyword
or word
whichever is lower.
Assuming len(word)
time to generate the hash + K time to compare the string in the bucket
Switch Statement #
The next step is a switch statement that compares
func IsKeyword(s string) bool {
switch s {
case consts.KEYWORD_GO, consts.KEYWORD_IF, consts.KEYWORD_FOR, consts.KEYWORD_MAP, consts.KEYWORD_VAR, consts.KEYWORD_BREAK, consts.KEYWORD_CASE, consts.KEYWORD_CHAN, consts.KEYWORD_ELSE, consts.KEYWORD_FUNC, consts.KEYWORD_GOTO, consts.KEYWORD_TYPE, consts.KEYWORD_CONST, consts.KEYWORD_DEFER, consts.KEYWORD_RANGE, consts.KEYWORD_RETURN, consts.KEYWORD_SELECT, consts.KEYWORD_STRUCT, consts.KEYWORD_SWITCH, consts.KEYWORD_IMPORT, consts.KEYWORD_DEFAULT, consts.KEYWORD_PACKAGE, consts.KEYWORD_CONTINUE, consts.KEYWORD_INTERFACE, consts.KEYWORD_FALLTHROUGH:
return true
default:
return false
}
}
This is literally a switch statement with all of the keywords as a case to return true and anything else returns false. I am not sure of the runtime of this as it really depends on how Golang optimizes a switch statement.
List + Ptr #
This is the 1st optimization that I have added.
The underlying optimization here is to have an index that will loop through words of the same length as a keyword.
When it is equal to a keyword, it will return true
, otherwise, it returns false.
buckets := [10][2]int{
{0, 2}, // 2
{2, 5}, // 3
{5, 11}, // 4
{11, 15}, // 5
{15, 20}, // 6
{20, 22}, // 7
{22, 23}, // 8
{23, 24}, // 9
{0, 0}, // 10
{24, 25}, // 11
}
func IsKeyword(s string) bool {
if len(s) < 2 || len(s) > 11 {
return false
}
rangeVal := buckets[len(s)-2]
for i := rangeVal[0]; i < rangeVal[1]; i++ {
if consts.Keywords[i] == s {
return true
}
}
return false
}
The runtime of this is O(nk) where n is the number keywords in the comparison and k is the length of the word/keyword.
Custom Hash #
The last algorithm is a custom hash that I came up with. This will calculate the ascii sum of the word and compares it to the word that is found in the buckets that I have generated.
I discovered this when I realize that the ascii
sum of all the golang keywords is unique.
After this key discovery, I needed to find a modulus that will result in a even split of all the keywords in an array.
After brute forcing, I stumbled upon the magic number of 73 which resulted in the table below.
KwHashMap := [73]string{"", "switch", "", "goto", "", "", "break", "defer", "", "", "import", "default", "type", "", "range", "return", "fallthrough", "", "", "", "struct", "", "", "", "", "", "map", "", "", "", "", "", "", "", "", "for", "", "var", "", "", "const", "", "", "", "", "chan", "", "case", "", "", "", "", "", "", "", "", "select", "", "", "package", "else", "if", "", "func", "", "", "continue", "", "go", "interface", "", "", ""}
func IsKeyword(s string) bool {
if len(s) < 2 || len(s) > 11 {
return false
}
hash, i := 0, 0
for i < len(s) {
hash += int(s[i])
i++
}
if len(KwHashMap[hash%73]) != len(s) {
return false
}
return KwHashMap[hash%73] == s
}
The runtime of this algorithm is O(len(word) + k) where k is the length of the word/keyword whichever is lower.
Conclusion #
After going through the different iterations here, I created a benchmark to time the amount of time each algorithm will take.
You can find the benchmark code here The main snippet of the code is here
func BenchmarkKeywordLookup(b *testing.B) {
testAlgorithms := map[string]algo{
"Loop": naiveIteration(),
"Map": naiveMap(),
"Current": IsKeyword,
"Switch": naiveSwitch(),
"List and Ptr": attempt1(),
"Custom Hash": attempt2(),
}
tests := make([]kwTest, 0, 50)
for i := 0; i < 25; i++ {
kw := consts.Keywords[i]
tests = append(tests, kwTest{keyword: kw, expectedRes: true})
for j := 0; j < 25*10; j++ {
kw1 := strings.Repeat(string([]byte{byte('a') + byte(i+j)}), i)
tests = append(tests, kwTest{keyword: kw1, expectedRes: false})
}
}
for name, algorithm := range testAlgorithms {
b.Run(name, func(b *testing.B) {
for i := 0; i < b.N; i++ {
t := tests[i%50]
assert.Equal(b, t.expectedRes, algorithm(t.keyword), t.keyword)
}
})
}
}
The goal is to create a 1:10 proportion of words that are tested to get a general gauge of how long the code will take to differentiate keywords from non-keywords.
The proportion here is important as I expect that most of the structs which use this builder will likely not use a keyword as a struct
value.
Function | Runtime |
---|---|
Loop | 182.3 ns/op |
Map | 174.4 ns/op |
Switch | 156.1 ns/op |
List and Ptr | 163.0 ns/op |
Custom Hash | 157.8 ns/op |
As expected, the naive implementation with the loop is the lower as it iterates through all the words (n) and compare all the letters of the words (k) resulting in O(nk)
.
Maps makes it faster by implementing a hash (O(k)
) and comparison (O(k)
) to have a runtime of ~O(2k)
.
With some compiler magic, the switch statement performs the best, seems like compliers really are magic.
For my list and pointer methods, there can be > 1 comparisons but it saves the time on the hash function, this allowed it to edge out the map implementation.
Lastly for the custom hash, it will not do any comparisons most of the time as the buckets are mostly empty and not of the correct length.
Computing the hash required O(k)
For those which still require the comparison, there will still be the cost of 1 comparison (k).
Remove Extra runtime checks (v0.0.4) #
After looking at my code more, I realized that with enough parsing in my string builder, I can remove the additional imports and formatting code at the end by fine tuning my strings.Builder
.
By adding that simple fix, the runtime of the code decreased significantly.
By removing the format/import steps, it cuts the runtime by another 100000 ns/op
and reduced the runtime of generating the struct
file to 293983 ns/op
.
Conclusion #
Version | Runtime of CodeGen | Changes |
---|---|---|
v0.0.1 | 1083661 ns/op | Initial Version |
v0.0.2 | 536149 ns/op | Format Builder in memory |
v0.0.2 (alt) | 823289 ns/op | Use templates instead of string builder |
v0.0.3 | 483838 ns/op | Optimize keyword check |
v0.0.4 | 293983 ns/op | Manual format/import pkgs |
Overall this is the runtime of the code throughout the different generations of BuilderGen
and the different lessons that I have learnt.
I hope that the lessons that are learnt here can help you improve the performance of your code as well.
Links #
Appendix A: A list of Golang
keywords #
break default func interface select
case defer go map struct
chan else goto package switch
const fallthrough if range type
continue for import return var