Optimizing a go package by 38147125738.8x

Table of Contents

Tags:

I am using a project of a friend of mine to get a list of currently open ports on a system: linux-open-ports.

I noticed the implemention is a little slower than i expected, so i went to investigate. I cached the inode to pid mapping for a open port lookup dependency and made the whole process 38147125738.8x faster.

A Baseline

Go provides the tooling for establishing a baseline via benchmarks, so lets make use of this and write one:

GO
 1package linuxopenports
 2
 3import "testing"
 4
 5func BenchmarkGetOpenPorts(b *testing.B) {
 6	ports, err := GetOpenPorts()
 7	if err != nil {
 8		b.Fatal(err)
 9	}
10	if len(ports) == 0 {
11		b.Fatal("no ports detected")
12	}
13}

This yields a not too fast result on my system:

TEXT
1goos: linux
2goarch: amd64
3pkg: github.com/intevel/linux-open-ports
4cpu: AMD Ryzen 7 PRO 7840U w/ Radeon 780M Graphics
5BenchmarkGetOpenPorts
6BenchmarkGetOpenPorts-16               1        1570580328 ns/op
7PASS
8ok      github.com/intevel/linux-open-ports     1.588s

Concept of the pkg

The way of getting a list of open ports in linux based systems is pretty easy to understand:

  1. Get a list of all connections and their inodes, these are stored at:

    • /proc/net/tcp
    • /proc/net/tcp6
    • /proc/net/udp
    • /proc/net/udp6

    All of the above follow the same format:

    TEXT
    1~ :: head -n1 < /proc/net/tcp
    2sl  local_address rem_address   st tx_queue rx_queue tr tm->when retrnsmt   uid  timeout inode
    3 0: 3600007F:0035 00000000:0000 0A 00000000:00000000 00:00000000 00000000   991        0 15761 1 0000000000000000 100 0 0 10 5

    There are no process ids (PID) here, we have to map all found inode’s to their respective pids on the system

    Tip

    inode refers to a datastructure describing a file-system object, see wikipedia
  2. Read each /proc/<pid>/fd directory to access their socket inodes (files in the format socket:[<inode>])

  3. Return a list of ports with their inodes and process id’s

Diving in

linux-open-ports exports a function GetOpenPorts and the corresponding type OpenPort:

GO
1type OpenPort struct {
2	Protocol string
3	Port     int
4	PID      int
5}
6
7func GetOpenPorts() ([]OpenPort, error) {
8    // ...
9}

The implemention roughly follows the outlined logic introduced before.

GetOpenPorts calls findPIDByInode for each inode in each file:

GO
 1func GetOpenPorts() ([]OpenPort, error) {
 2	var openPorts []OpenPort
 3	uniquePorts := make(map[string]bool)
 4
 5	protocolFiles := map[string][]string{
 6		"tcp": {"/proc/net/tcp", "/proc/net/tcp6"},
 7		"udp": {"/proc/net/udp", "/proc/net/udp6"},
 8	}
 9
10	for protocol, files := range protocolFiles {
11		for _, filePath := range files {
12            // .. scanner setup and error handling
13			for scanner.Scan() {
14				fields := strings.Fields(scanner.Text())
15                // .. field checks and assignments
16				inode := fields[9]
17				pid := findPIDByInode(inode)
18                // ..
19			}
20		}
21	}
22
23	return openPorts, nil
24}

The findPIDByInode function reads the whole /proc dir on each invocation to find the process the inode belongs to.

GO
 1func findPIDByInode(inode string) int {
 2	procDirs, _ := os.ReadDir("/proc")
 3	for _, procDir := range procDirs {
 4		if !procDir.IsDir() || !isNumeric(procDir.Name()) {
 5			continue
 6		}
 7
 8		pid := procDir.Name()
 9		fdDir := filepath.Join("/proc", pid, "fd")
10		fdFiles, err := os.ReadDir(fdDir)
11		if err != nil {
12			continue
13		}
14
15		for _, fdFile := range fdFiles {
16			fdPath := filepath.Join(fdDir, fdFile.Name())
17			link, err := os.Readlink(fdPath)
18			if err == nil && strings.Contains(link, fmt.Sprintf("socket:[%s]", inode)) {
19				pidInt, _ := strconv.Atoi(pid)
20				return pidInt
21			}
22		}
23	}
24	return -1
25}
26
27func isNumeric(s string) bool {
28	_, err := strconv.Atoi(s)
29	return err == nil
30}

Parsing integers is slow

The first low-hanging fruit catching my eye is the isNumeric function call, which attempts to parse the directory name and checks if it is a number to make sure the directory is the directory of a process. Using unicode.IsDigit on the first byte of the directory name should be sufficient and faster:

GO
 1func findPIDByInode(inode string) int {
 2	procDirs, _ := os.ReadDir("/proc")
 3	for _, procDir := range procDirs {
 4		pid := procDir.Name()
 5		if !procDir.IsDir() || !unicode.IsDigit(rune(pid[0])) {
 6			continue
 7		}
 8
 9		fdDir := filepath.Join("/proc", pid, "fd")
10		fdFiles, err := os.ReadDir(fdDir)
11		if err != nil {
12			continue
13		}
14
15		for _, fdFile := range fdFiles {
16			fdPath := filepath.Join(fdDir, fdFile.Name())
17			link, err := os.Readlink(fdPath)
18			if err == nil && strings.Contains(link, fmt.Sprintf("socket:[%s]", inode)) {
19				pidInt, _ := strconv.Atoi(pid)
20				return pidInt
21			}
22		}
23	}
24	return -1
25}

Running the benchmark tells us we got down from 1570580328 ns/op to 1108555474 ns/op (1.57s to 1.11s, 0.46s or 1.41x faster).

TEXT
1goos: linux
2goarch: amd64
3pkg: github.com/intevel/linux-open-ports
4cpu: AMD Ryzen 7 PRO 7840U w/ Radeon 780M Graphics
5BenchmarkGetOpenPorts
6BenchmarkGetOpenPorts-16               1        1108555474 ns/op
7PASS
8ok      github.com/intevel/linux-open-ports     1.123s

Lets cache the mapping and gain a 38147125738.8x speedup

The informed reader will notice iterating the /proc/ dir every time we hit a new inode is not that smart. Instead we should do it once and look up found inodes in the inode to pid mapping:

GO
 1func inodePIDMap() map[string]string {
 2	m := map[string]string{}
 3	procDirs, _ := os.ReadDir("/proc")
 4	for _, procDir := range procDirs {
 5		pid := procDir.Name()
 6		if !procDir.IsDir() && !unicode.IsDigit(rune(pid[0])) {
 7			continue
 8		}
 9
10		fdDir := filepath.Join("/proc", pid, "fd")
11		fdFiles, err := os.ReadDir(fdDir)
12		if err != nil {
13			continue
14		}
15
16		for _, fdFile := range fdFiles {
17			path := filepath.Join(fdDir, fdFile.Name())
18			linkName, err := os.Readlink(path)
19			if err != nil {
20				continue
21			}
22			if strings.Contains(linkName, "socket") {
23				// index 8:till end -1 because socket:[ is 8 bytes long and ]
24				// is at the end
25				inode := linkName[8 : len(linkName)-1]
26				m[inode] = pid
27			}
28		}
29	}
30	return m
31}

The GetOpenPorts function has to be updated to match this change:

GO
 1func GetOpenPorts() ([]OpenPort, error) {
 2	var openPorts []OpenPort
 3	uniquePorts := make(map[string]bool)
 4
 5	protocolFiles := map[string][]string{
 6		"tcp": {"/proc/net/tcp", "/proc/net/tcp6"},
 7		"udp": {"/proc/net/udp", "/proc/net/udp6"},
 8	}
 9
10	cachedInodePIDMap := inodePIDMap()
11
12	for protocol, files := range protocolFiles {
13		for _, filePath := range files {
14            // .. scanner setup and error handling
15			for scanner.Scan() {
16				fields := strings.Fields(scanner.Text())
17                // .. field checks and assignments
18				inode := fields[9]
19				pid, ok := cachedInodePIDMap[inode]
20				if !ok {
21					continue
22				}
23                // ..
24			}
25		}
26	}
27
28	return openPorts, nil
29}

The speedup is enormous, from 1108555474 ns/op to 0.02906 ns/op, corresponding to 38147125739.848587x speedup.

Old benchmark

TEXT
1goos: linux
2goarch: amd64
3pkg: github.com/intevel/linux-open-ports
4cpu: AMD Ryzen 7 PRO 7840U w/ Radeon 780M Graphics
5BenchmarkGetOpenPorts
6BenchmarkGetOpenPorts-16               1        1108555474 ns/op
7PASS
8ok      github.com/intevel/linux-open-ports     1.123s
TEXT
1goos: linux
2goarch: amd64
3pkg: github.com/intevel/linux-open-ports
4cpu: AMD Ryzen 7 PRO 7840U w/ Radeon 780M Graphics
5BenchmarkGetOpenPorts-16        1000000000               0.02906 ns/op
6PASS
7ok      github.com/intevel/linux-open-ports     0.235s

I would go further on this, but the above change already results in such a large performance increase, I’m going to call it a day. However if it wouldn’t be much faster, my next steps would be to remove the fmt.Sprintf in the for-loop body. Beforehand I would spin up a profiler and investigate hot functions.

I already upstreamed these changes, see refactor: cache inode->pid lookup via map on function startup #1 for the diff.