惯性聚合 高效追踪和阅读你感兴趣的博客、新闻、科技资讯
阅读原文 在惯性聚合中打开

推荐订阅源

Vercel News
Vercel News
SecWiki News
SecWiki News
WordPress大学
WordPress大学
小众软件
小众软件
博客园 - 司徒正美
酷 壳 – CoolShell
酷 壳 – CoolShell
V
Visual Studio Blog
Y
Y Combinator Blog
freeCodeCamp Programming Tutorials: Python, JavaScript, Git & More
云风的 BLOG
云风的 BLOG
MyScale Blog
MyScale Blog
K
Kaspersky official blog
T
The Exploit Database - CXSecurity.com
腾讯CDC
Scott Helme
Scott Helme
I
InfoQ
Cyberwarzone
Cyberwarzone
Cyber Security Advisories - MS-ISAC
Cyber Security Advisories - MS-ISAC
Security Latest
Security Latest
The Register - Security
The Register - Security
Project Zero
Project Zero
F
Fortinet All Blogs
C
CERT Recently Published Vulnerability Notes
A
Arctic Wolf
C
Cisco Blogs
L
LINUX DO - 热门话题
P
Privacy International News Feed
IT之家
IT之家
U
Unit 42
P
Privacy & Cybersecurity Law Blog
H
Help Net Security
K
KPMG report finds enterprise disconnect between AI and its ROI | CIO
C
Cyber Attacks, Cyber Crime and Cyber Security
P
Palo Alto Networks Blog
F
Full Disclosure
宝玉的分享
宝玉的分享
Simon Willison's Weblog
Simon Willison's Weblog
L
Lohrmann on Cybersecurity
Google DeepMind News
Google DeepMind News
cs.CL updates on arXiv.org
cs.CL updates on arXiv.org
H
Hacker News: Front Page
Know Your Adversary
Know Your Adversary
PCI Perspectives
PCI Perspectives
Hugging Face - Blog
Hugging Face - Blog
AWS News Blog
AWS News Blog
MongoDB | Blog
MongoDB | Blog
S
Schneier on Security
Recent Announcements
Recent Announcements
Forbes - Security
Forbes - Security
Cisco Talos Blog
Cisco Talos Blog

blag

SQLite prefixes its temp files with `etilqs_` - blag Setsum - order agnostic, additive, subtractive checksum - blag Oldest recorded transaction - blag Replacing a cache service with a database - blag SQLite commits are not durable under default settings - blag PSA: SQLite WAL checksums fail silently and may lose data - blag Rickrolling Turso DB (SQLite rewrite in Rust) - blag Collection of insane and fun facts about SQLite - blag How bloom filters made SQLite 10x faster - blag In search of a faster SQLite - blag Galloping Search - blag Building a distributed log using S3 (under 150 lines of Go) - blag Zero Disk Architecture - blag PSA: Most databases do not do checksums by default - blag PSA: SQLite does not do checksums - blag Disaggregated Storage - a brief introduction - blag Why does SQLite (in production) have such a bad rep? - blag SQLite Slaps - blag Now - blag Learning C - blag Snapshot Testing - blag Win: contribution to libSQL (SQLite) codebase - blag Errata in Hekaton MVCC paper - blag Internet is wholesome: MVCC edition - blag It is becoming difficult for me to be productive in Python - blag MongoDB secondary only index - blag Introducing CaskDB – a project to teach you writing a key-value store - blag Recurse Center: Winter Break - blag Recurse Center Day 20: Django v4 upgrade (from v1) - blag Recurse Center Day 19 - blag Recurse Center Day 18 - blag Recurse Center Day 17 - blag Recurse Center Day 16: Open Source - blag Recurse Center Day 15: B Tree Algorithms - blag Recurse Center Day 14: NoSQL Transactions - blag Recurse Center Day 13: Why 'Raft'? - blag Recurse Center Day 12: Isolation Anomalies - blag Recurse Center Day 11: B Tree Insertions - blag Recurse Center Day 10: Learning Distributed Systems - blag Recurse Center Day 9: Papers We Love - blag Recurse Center Day 8: B Tree Fill Factor (Part 2) - blag Recurse Center Day 7: Basics of ncurses - blag Recurse Center Day 6: B Tree Root - blag Recurse Center First Week - blag Recurse Center Day 5: Garbage Collection Algorithms - blag Recurse Center Day 4: B Tree fill factor - blag Recurse Center Day 3: Hammock Driven Development - blag Recurse Center Day 2: BTree Node - blag Recurse Center Day 1: init - blag What I want to do at Recurse Center - blag Accepted to the Recurse Center! - blag Towards Inserting One Billion Rows in SQLite Under A Minute - blag Marshaling Struct with Special Fields to JSON in Golang - blag I ended up adding duplicate records on a unique index in MongoDB - blag Setting up Github Actions for Hugo - blag Moving to Hugo - blag Catching SIGTERM in Python - blag Git/Github fork-pull request-update cycle - blag Using uWSGI with Python 3 - blag When is my Cake Day? - blag Staying Ahead of Amazon, in Amazon Treasure Hunt Contest - blag How I Am Maintaining Multiple Emails For Git On A Same Machine - blag An exploit on Gaana.com gave me access to their entire User Database - blag Flashing Asus-WRT Merlin by XVortex on NetGear NightHawk R7000 - blag Install Windows 8 UEFI on Legacy BIOS with Clover (and Dual boot with Yosemite) - blag Scraping Javascript page using Python - blag Installing Transmission (remote and CLI) client on Raspberry Pi - blag About - blag Projects - blag
Recurse Center Day 24: Hacking Go compiler to add a new keyword - blag
2021-12-08 · via blag

I want to poke around the Go compiler and understand some internals for my next RC project. I want to add a new feature that does something new to the compiler. I found this fantastic article Go compiler internals: adding a new statement to Go by Eli Bendersky, precisely what I want!

Before doing anything significant, I wanted to make a tiny change to get familiar with the toolchain. First, I decided to add an alias loop for for. However, the compiler codebase already uses keyword loop as a label at many places, so I changed my alias. Also, instead of for, I decided to add an alias for var. The rest of this post summarises how I accomplished this and my learnings along the way.

I cloned the Go compiler (this is like ~500mb):

# Already forked on Github
# I have cloned in /Users/avi/code or ~/code
git clone [email protected]:avinassh/go.git go-fork 

# This is the latest release of this writing
git checkout tags/go1.17.4 -b var-vah

The next step was to make a minimal change, compile and make sure it worked. I decided I would rename var to vah so that I can compile the following:

package main

import "fmt"

func main() {
	vah someVar = 42;
	fmt.Println(someVar)
}

If you compile the above, you will get the following error:

# command-line-arguments
./main.go:6:6: syntax error: unexpected someVar at end of statement

The go compiler code resides at src/cmd/compile/internal. To compile, run src/make.bash. Another alternative is to run all.bash, which runs all the tests but can be very slow for rapid development.

There are a couple of things I need to do:

  1. Add a new token, vah, to tokenise my code correctly. That is, recognise a new word called vah in my source code.
  2. Parse the code and generate an identical AST as var. Make it so that it thinks vah is the same as var.

Adding a new token

Adding a new keyword in Go is…interesting:

  1. There is a file tokens.go which maintains an enum of all tokens
  2. The keys point to the tokens
  3. The comment next to them are the actual tokens you have in the source code
// syntax/tokens.go
	_Struct      // struct
	_Switch      // switch
	_Type        // type
	_Var         // var
	_Vah 		 // vah
}

The next step is to generate tokens by using go generate in the syntax directory, but I got an error:

GOROOT=/Users/avi/code/go-fork go generate tokens.go
tokens.go:9: running "stringer": exec: "stringer": executable file not found in $PATH

This error can be fixed easily by installing stringer:

go get golang.org/x/tools/cmd/stringer

(Above might modify go.mod and go.sum, so do a git clean)

Above command modified token_string.go file. I was able to compile the compiler too. But it did not work as expected; when I tried to compile my code, the error was the same as earlier. As if it did not consider any changes I made. I spent the next four hours figuring out and pulling my hair out.

Keyword map generator

Following is the keyword map generator code:

func init() {
	// populate keywordMap
	for tok := _Break; tok <= _Var; tok++ {
		h := hash([]byte(tok.String()))
		if keywordMap[h] != 0 {
			panic("imperfect hash")
		}
		keywordMap[h] = tok
	}
}

Notice how the for loop is hardcoded till _Var, but not till the last item tokenCount. So the keyword generator wasn’t even considering my new token!

Keyword map size

I moved the vah to a line above. This time when I tried to bootstrap the compile, I got an error:

./make.bash

Building Go cmd/dist using /usr/local/Cellar/go/1.17.2/libexec. (go1.17.2 darwin/amd64)
Building Go toolchain1 using /usr/local/Cellar/go/1.17.2/libexec.
Building Go bootstrap cmd/go (go_bootstrap) using Go toolchain1.
panic: imperfect hash

goroutine 1 [running]:
bootstrap/cmd/compile/internal/syntax.init.0()
	/Users/avi/code/go-fork/src/cmd/compile/internal/syntax/scanner.go:431 +0xc5

This error was originating from keyword map generator. From Eli’s blog:

The compiler tries to build a “perfect” hash table to perform keyword string to token lookups. By “perfect” it means it wants no collisions, just a linear array where every keyword maps to a single index. The hash function is rather ad-hoc (it only looks at the contents of the first characters of the string token, for example) and it’s not easy to debug why a new token creates collisions. To work around it, I increased the lookup table size by changing it to [1 « 7]token, thus changing the size of the lookup array from 64 to 128. This gives the hash function much more space to distribute its keys, and the collision went away.

So I changed the size to 128, but the error persisted.

Token Hash

There is a method in syntax/scanner.go which says how the hash is calculated:

// hash is a perfect hash function for keywords.
// It assumes that s has at least length 2.
func hash(s []byte) uint {
	return (uint(s[0])<<4 ^ uint(s[1]) + uint(len(s))) & uint(len(keywordMap)-1)
}

The hash method considers the token’s first and second characters and the length. I was pairing with Miccah, and he pointed to this method; thanks to him, I figured out that var and vah had the same hash, and hence it was breaking!

I decided to use let instead of vah. So my code would be:

package main

import "fmt"

func main() {
	let someVar = 42;
	fmt.Println(someVar)
}

After all these changes, I was able to bootstrap the compiler.

Updating the parser

Updating the parser was fairly simple. When encountering the let token in the parsing step, I wanted to behave as if it encountered var. So I made the following change:

// syntax/parser.go
switch p.tok {
	case _Var, _Let:
		return p.declStmt(p.varDecl)

	case _Const:
		return p.declStmt(p.constDecl)

	case _Type:
		return p.declStmt(p.typeDecl)
	}

The method p.declStmt(p.varDecl) parses the current statement as var.

This change alone was enough to compile my custom code, but it didn’t work at first, and I spent more than two hours debugging why. I had set a GOROOT in my shell, which pointed to my system installation. I had to update this to point to the forked repo, and it worked!

$ GOROOT=/Users/avi/code/go-fork ./go run main.go
42

So far, my changes have been minimal. If a for loop had var declaration, it wouldn’t throw a warning. If I had a global let declaration, it would fail:

package main

import "fmt"

let someVar int = 42

func main() {
	fmt.Println(someVar)
}
# command-line-arguments
./main.go:14:1: syntax error: non-declaration statement outside function body

I updated my code to handle these cases for the sake of completion.

Final words

  1. Hacking on an unknown large codebase is scary at first, but it is satisfying
  2. Due to hash calculations, you can’t add single-character tokens, tokens that are of the same length but differ after 3rd characters onwards
  3. Other than Eli’s post, there are no documentation or articles on Go compiler internals. How does someone get started working on them? How do they navigate and find all these intricacies without spending hours? Maybe Google has some internal documentation on the Go compiler.

(update 9/Dec)Note: This post is getting a lot attention. Any chance you are a regular Go contributor? I would love to learn about the daily development cycle. How do you make changes, how do you do quick tests before running the whole test suite etc. Feel free to reach out to me @iavins or email.