pprof进阶使用

火焰图

安装go-torch

uber 开源工具go-torch，可以直接读取 golang profiling 数据，并生成一个火焰图的 svg 文件。

go get -v github.com/uber/go-torch

火焰图 svg 文件可以通过浏览器打开，可以通过点击每个方块来 zoom in 检查它内部的内容。

火焰图的调用顺序从下到上，每个方块代表一个函数，它上面一层表示这个函数会调用哪些函数，方块的大小代表了占用 CPU 使用的长短。火焰图的配色并没有特殊的意义，默认的红、黄配色是为了更像火焰而已。

go-torch 工具的使用非常简单，没有任何参数的话，它会尝试从http://localhost:8080/debug/pprof/profile获取 profiling 数据。它有三个常用的参数可以调整：

-u –url：要访问的 URL，这里只是主机和端口部分
-s –suffix：pprof profile 的路径，默认为 /debug/pprof/profile
–seconds：要执行 profiling 的时间长度，默认为 30s

安装 FlameGraph

要生成火焰图，需要事先安装 FlameGraph工具，这个工具的安装很简单（需要perl环境支持），只要把对应的可执行文件加入到环境变量中即可。

下载安装perl：https://www.perl.org/get.html
下载FlameGraph：git clone https://github.com/brendangregg/FlameGraph.git
将FlameGraph目录加入到操作系统的环境变量中。

压测工具wrk

https://github.com/wg/wrk 或 https://github.com/adjust/go-wrk

火焰图与压测工具结合使用

以服务型应用为例，源码见

使用wrk对version接口进行压测:

go-wrk -n 50000 http://127.0.0.1:8080/version

在上面压测进行的同时，打开另一个终端执行:

go-torch -u http://127.0.0.1:8080 -t 30

30秒之后终端会出现如下提示：

INFO[18:01:01] Run pprof command: go tool pprof -raw -seconds 30 http://127.0.0.1:8080/debug/pprof/profile
INFO[18:01:31] Writing svg to torch.svg

然后使用浏览器打开torch.svg就能看到如下火焰图了：

火焰图

火焰图的y轴表示cpu调用方法的先后，x轴表示在每个采样调用时间内，方法所占的时间百分比，越宽代表占据cpu时间越多。通过火焰图就可以更清楚的找出耗时长的函数调用，然后不断的修正代码，重新采样，不断优化。

此外还可以借助火焰图分析内存性能数据：

go-torch -inuse_space http://127.0.0.1:8080/debug/pprof/heap
go-torch -inuse_objects http://127.0.0.1:8080/debug/pprof/heap
go-torch -alloc_space http://127.0.0.1:8080/debug/pprof/heap
go-torch -alloc_objects http://127.0.0.1:8080/debug/pprof/heap

函数优化案例

以优化基准测试的lengthOfNonRepeatingSubStr函数为例

执行go test -bench=Substr -v -cpuprofile=cpu.prof -benchmem在性能测试的同时取得 CPU profiling 数据文件。
执行go tool pprof -http=":8081" cpu.prof，访问http://localhost:8081

通过上图可以看出，大部分时间都花在了map相关操作

知道了性能瓶颈所在之后，优化源代码，用[]int代替map

func lengthOfNonRepeatingSubStr1(s string) int {
	lastOccurred := make([]int, 0xffff)
	start := 0
	maxLength := 0

	for i, ch := range []rune(s) {
		if lastI := lastOccurred[ch]; lastI > start {
			start = lastI
		}
		if i-start+1 > maxLength {
			maxLength = i - start + 1
		}
		lastOccurred[ch] = i + 1
	}

	return maxLength
}

再执行测试命令：

zhhnzw$ go test -bench=Substr -v -cpuprofile=cpu.prof -benchmem
=== RUN   TestSubstr
--- PASS: TestSubstr (0.00s)
goos: darwin
goarch: amd64
cpu: Intel(R) Core(TM) i7-7920HQ CPU @ 3.10GHz
BenchmarkSubstr
    nonrepeating_test.go:41: len(s) = 491520
    nonrepeating_test.go:41: len(s) = 491520
    nonrepeating_test.go:41: len(s) = 491520
BenchmarkSubstr-8            601           1968375 ns/op         1179663 B/op          2 allocs/op
PASS

可以看到，执行效率从5152279 ns/op进步到了1968375 ns/op，内存消耗从655590 B/op退步到了1179663 B/op，再执行go tool pprof -http=":8081" cpu.prof，访问http://localhost:8081

优化后

可以看到，现在CPU性能消耗只剩下了stringtoslicerune，涉及到的就是rune(s)操作，这里是需要一个utf-8的解码操作耗费了CPU计算时间，因为必须要支持中文，这个解码操作是省不了的，所以CPU的优化到此就完成了。

接下来再来看内存，观察到函数每次被调用每次都会创建一个slice，这一点可以优化，把lastOccurred变量提取到全局变量，在函数内每次都清掉它的值，这样这个slice变量就只用初始化一次了

var lastOccurred = make([]int, 0xffff)
func lengthOfNonRepeatingSubStr2(s string) int {

	for i,l:=0,len(lastOccurred);i<l;i++ {
		lastOccurred[i]=0
	}
	start := 0
	maxLength := 0

	chs := []rune(s)
  // 这也是个优化细节，用 i<len(chs) 会每次调用 len 函数，下面这样写，len 函数只会调用一次
	for i,l:=0,len(chs);i<l;i++ {  
		if lastI := lastOccurred[chs[i]]; lastI > start {
			start = lastI
		}
		if i-start+1 > maxLength {
			maxLength = i - start + 1
		}
		lastOccurred[chs[i]] = i + 1
	}
  
	return maxLength
}

再执行测试命令：

zhhnzw$ go test -bench=Substr -v -cpuprofile=cpu.prof -benchmem
=== RUN   TestSubstr
--- PASS: TestSubstr (0.00s)
goos: darwin
goarch: amd64
cpu: Intel(R) Core(TM) i7-7920HQ CPU @ 3.10GHz
BenchmarkSubstr
    nonrepeating_test.go:41: len(s) = 491520
    nonrepeating_test.go:41: len(s) = 491520
    nonrepeating_test.go:41: len(s) = 491520
BenchmarkSubstr-8            620           1927372 ns/op          655362 B/op          1 allocs/op
PASS

可以看到，内存消耗进步到了655362 B/op，内存分配次数从2次优化到了1次，至此CPU和Memory都优化完毕了。

pprof与服务型应用

访问：http://127.0.0.1:8080/debug/pprof/
启动压力测试：go-wrk -n 50000 http://127.0.0.1:8080/version
点击 web 页面的 profile 按钮，收集信息，30 s 后会自动下载 profile 文件
拿到 profile 文件后：go tool pprof -http=":8081" profile

定位api接口阻塞案例

编写如下有问题的代码，/push接口通过queue推送数据，但没有消费者，在推送1024次之后就会陷入阻塞：

package main

import (
	"log"
	"net/http"
	_ "net/http/pprof"
	"runtime"
)

var queue chan string

func main() {
	runtime.SetBlockProfileRate(1)
	runtime.SetMutexProfileFraction(1)
	queue = make(chan string, 1024)
	//go consumer()
	http.HandleFunc("/push", pushData)
	http.HandleFunc("/test", test)
	if err := http.ListenAndServe(":6060", nil); err != nil {
		log.Fatal(err)
	}
}

func pushData(w http.ResponseWriter, r *http.Request) {
	queue <- "hello"
	w.WriteHeader(http.StatusOK)
	if _, err := w.Write([]byte("success")); err != nil {
		log.Println(err)
	}
}

func test(w http.ResponseWriter, r *http.Request) {
	w.WriteHeader(http.StatusOK)
	if _, err := w.Write([]byte("hello")); err != nil {
		log.Println(err)
	}
}

func consumer() {
	for data := range queue {
		log.Println(data)
	}
}

运行服务，执行如下基准测试后陷入阻塞：

func BenchmarkSendRequest(b *testing.B) {
	for i := 0; i < b.N; i++ {
		resp, err := http.Get("http://127.0.0.1:6060/push")
		b.Log(resp, err)
	}
}

打开http://127.0.0.1:6060/debug/pprof/goroutine?debug=1，看到如下结果：

注：最好休息几分钟再做这个操作，因为刚执行完基准测试，还有1000多个正常处理了请求的存活的goroutine未被回收，会对定位问题造成干扰，等几分钟之后，阻塞的goroutine就会浮出水面了。

goroutine profile: total 9
3 @ 0x10382b6 0x1006825 0x10063dd 0x125cff8 0x120a66f 0x120c009 0x120d31b 0x1209397 0x10678c1
#	0x125cff7	main.pushData+0x37			/Users/zhhnzw/workspace/mygithub/go_demos/interview/block/block.go:25
#	0x120a66e	net/http.HandlerFunc.ServeHTTP+0x2e	/usr/local/Cellar/[email protected]/1.18.6/libexec/src/net/http/server.go:2084
#	0x120c008	net/http.(*ServeMux).ServeHTTP+0x148	/usr/local/Cellar/[email protected]/1.18.6/libexec/src/net/http/server.go:2462
#	0x120d31a	net/http.serverHandler.ServeHTTP+0x43a	/usr/local/Cellar/[email protected]/1.18.6/libexec/src/net/http/server.go:2916
#	0x1209396	net/http.(*conn).serve+0x5d6		/usr/local/Cellar/[email protected]/1.18.6/libexec/src/net/http/server.go:1966

2 @ 0x10382b6 0x10319d7 0x1061de9 0x10a6a32 0x10a75da 0x10a75c8 0x115bf29 0x1166be5 0x12032ff 0x10678c1
#	0x1061de8	internal/poll.runtime_pollWait+0x88		/usr/local/Cellar/[email protected]/1.18.6/libexec/src/runtime/netpoll.go:302
#	0x10a6a31	internal/poll.(*pollDesc).wait+0x31		/usr/local/Cellar/[email protected]/1.18.6/libexec/src/internal/poll/fd_poll_runtime.go:83
#	0x10a75d9	internal/poll.(*pollDesc).waitRead+0x259	/usr/local/Cellar/[email protected]/1.18.6/libexec/src/internal/poll/fd_poll_runtime.go:88
...

通过第一行，可以看到程序现存9个goroutine，第二行开头数字3，表示有3个goroutine的调用栈是如下段落的展示结果，再下一个锻炼开头数字2，表示接下来这个段落的调用栈有2个goroutine在执行。

通过这一步已经可以发现问题了，block.go:25，含义是有3个goroutine都执行到了block.go的第25行，意味着这里多半是发生了阻塞。但接下来还是进一步定位问题：

再打开http://127.0.0.1:6060/debug/pprof/goroutine?debug=2，看到如下结果：

goroutine 2126 [running]:
runtime/pprof.writeGoroutineStacks({0x1341040, 0xc000292000})
	/usr/local/Cellar/[email protected]/1.18.6/libexec/src/runtime/pprof/pprof.go:694 +0x70
runtime/pprof.writeGoroutine({0x1341040?, 0xc000292000?}, 0x1015d2f?)
	/usr/local/Cellar/[email protected]/1.18.6/libexec/src/runtime/pprof/pprof.go:683 +0x2b
runtime/pprof.(*Profile).WriteTo(0x12a7a60?, {0x1341040?, 0xc000292000?}, 0xc?)
	/usr/local/Cellar/[email protected]/1.18.6/libexec/src/runtime/pprof/pprof.go:332 +0x14b
net/http/pprof.handler.ServeHTTP({0xc000020521, 0x9}, {0x1342e90, 0xc000292000}, 0x2e302e302e373231?)
	/usr/local/Cellar/[email protected]/1.18.6/libexec/src/net/http/pprof/pprof.go:253 +0x4a5
net/http/pprof.Index({0x1342e90?, 0xc000292000}, 0xc001170100)
	/usr/local/Cellar/[email protected]/1.18.6/libexec/src/net/http/pprof/pprof.go:371 +0x13e
net/http.HandlerFunc.ServeHTTP(0x9d849a8?, {0x1342e90?, 0xc000292000?}, 0x100e165?)
	/usr/local/Cellar/[email protected]/1.18.6/libexec/src/net/http/server.go:2084 +0x2f
net/http.(*ServeMux).ServeHTTP(0xc00002052b?, {0x1342e90, 0xc000292000}, 0xc001170100)
	/usr/local/Cellar/[email protected]/1.18.6/libexec/src/net/http/server.go:2462 +0x149
net/http.serverHandler.ServeHTTP({0xc000e04210?}, {0x1342e90, 0xc000292000}, 0xc001170100)
	/usr/local/Cellar/[email protected]/1.18.6/libexec/src/net/http/server.go:2916 +0x43b
net/http.(*conn).serve(0xc000110a00, {0x1343148, 0xc0001013e0})
	/usr/local/Cellar/[email protected]/1.18.6/libexec/src/net/http/server.go:1966 +0x5d7
created by net/http.(*Server).Serve
	/usr/local/Cellar/[email protected]/1.18.6/libexec/src/net/http/server.go:3071 +0x4db

goroutine 1 [IO wait]:
internal/poll.runtime_pollWait(0x9d84a98, 0x72)
	/usr/local/Cellar/[email protected]/1.18.6/libexec/src/runtime/netpoll.go:302 +0x89
internal/poll.(*pollDesc).wait(0xc000164600?, 0x4?, 0x0)
	/usr/local/Cellar/[email protected]/1.18.6/libexec/src/internal/poll/fd_poll_runtime.go:83 +0x32
internal/poll.(*pollDesc).waitRead(...)
	/usr/local/Cellar/[email protected]/1.18.6/libexec/src/internal/poll/fd_poll_runtime.go:88
internal/poll.(*FD).Accept(0xc000164600)
	/usr/local/Cellar/[email protected]/1.18.6/libexec/src/internal/poll/fd_unix.go:614 +0x22c
net.(*netFD).accept(0xc000164600)
	/usr/local/Cellar/[email protected]/1.18.6/libexec/src/net/fd_unix.go:172 +0x35
net.(*TCPListener).accept(0xc00000e210)
	/usr/local/Cellar/[email protected]/1.18.6/libexec/src/net/tcpsock_posix.go:139 +0x28
net.(*TCPListener).Accept(0xc00000e210)
	/usr/local/Cellar/[email protected]/1.18.6/libexec/src/net/tcpsock.go:288 +0x3d
net/http.(*Server).Serve(0xc00017e000, {0x1342ce0, 0xc00000e210})
	/usr/local/Cellar/[email protected]/1.18.6/libexec/src/net/http/server.go:3039 +0x385
net/http.(*Server).ListenAndServe(0xc00017e000)
	/usr/local/Cellar/[email protected]/1.18.6/libexec/src/net/http/server.go:2968 +0x7d
net/http.ListenAndServe(...)
	/usr/local/Cellar/[email protected]/1.18.6/libexec/src/net/http/server.go:3222
main.main()
	/Users/zhhnzw/workspace/mygithub/go_demos/interview/block/block.go:19 +0xdb

goroutine 2074 [chan send, 14 minutes]:
main.pushData({0x1342e90, 0xc0010dc0e0}, 0x0?)
	/Users/zhhnzw/workspace/mygithub/go_demos/interview/block/block.go:25 +0x38
net/http.HandlerFunc.ServeHTTP(0x9df1c40?, {0x1342e90?, 0xc0010dc0e0?}, 0x100e165?)
	/usr/local/Cellar/[email protected]/1.18.6/libexec/src/net/http/server.go:2084 +0x2f
net/http.(*ServeMux).ServeHTTP(0x0?, {0x1342e90, 0xc0010dc0e0}, 0xc001170400)
	/usr/local/Cellar/[email protected]/1.18.6/libexec/src/net/http/server.go:2462 +0x149
net/http.serverHandler.ServeHTTP({0xc001158a20?}, {0x1342e90, 0xc0010dc0e0}, 0xc001170400)
	/usr/local/Cellar/[email protected]/1.18.6/libexec/src/net/http/server.go:2916 +0x43b
net/http.(*conn).serve(0xc001117180, {0x1343148, 0xc0001013e0})
	/usr/local/Cellar/[email protected]/1.18.6/libexec/src/net/http/server.go:1966 +0x5d7
created by net/http.(*Server).Serve
	/usr/local/Cellar/[email protected]/1.18.6/libexec/src/net/http/server.go:3071 +0x4db

goroutine 2104 [IO wait]:
internal/poll.runtime_pollWait(0x9d849a8, 0x72)
	/usr/local/Cellar/[email protected]/1.18.6/libexec/src/runtime/netpoll.go:302 +0x89
internal/poll.(*pollDesc).wait(0xc00167e180?, 0xc000e04221?, 0x0)
	/usr/local/Cellar/[email protected]/1.18.6/libexec/src/internal/poll/fd_poll_runtime.go:83 +0x32
internal/poll.(*pollDesc).waitRead(...)
	/usr/local/Cellar/[email protected]/1.18.6/libexec/src/internal/poll/fd_poll_runtime.go:88
internal/poll.(*FD).Read(0xc00167e180, {0xc000e04221, 0x1, 0x1})
	/usr/local/Cellar/[email protected]/1.18.6/libexec/src/internal/poll/fd_unix.go:167 +0x25a
net.(*netFD).Read(0xc00167e180, {0xc000e04221?, 0x0?, 0x0?})
	/usr/local/Cellar/[email protected]/1.18.6/libexec/src/net/fd_posix.go:55 +0x29
net.(*conn).Read(0xc0000ba020, {0xc000e04221?, 0x0?, 0x0?})
	/usr/local/Cellar/[email protected]/1.18.6/libexec/src/net/net.go:183 +0x45
net/http.(*connReader).backgroundRead(0xc000e04210)
	/usr/local/Cellar/[email protected]/1.18.6/libexec/src/net/http/server.go:672 +0x3f
created by net/http.(*connReader).startBackgroundRead
	/usr/local/Cellar/[email protected]/1.18.6/libexec/src/net/http/server.go:668 +0xca

goroutine 2130 [chan send, 14 minutes]:
main.pushData({0x1342e90, 0xc0002920e0}, 0x0?)
	/Users/zhhnzw/workspace/mygithub/go_demos/interview/block/block.go:25 +0x38
net/http.HandlerFunc.ServeHTTP(0x9d846d8?, {0x1342e90?, 0xc0002920e0?}, 0x100e165?)
	/usr/local/Cellar/[email protected]/1.18.6/libexec/src/net/http/server.go:2084 +0x2f
net/http.(*ServeMux).ServeHTTP(0x0?, {0x1342e90, 0xc0002920e0}, 0xc000973300)
	/usr/local/Cellar/[email protected]/1.18.6/libexec/src/net/http/server.go:2462 +0x149
net/http.serverHandler.ServeHTTP({0xc000b63650?}, {0x1342e90, 0xc0002920e0}, 0xc000973300)
	/usr/local/Cellar/[email protected]/1.18.6/libexec/src/net/http/server.go:2916 +0x43b
net/http.(*conn).serve(0xc000111b80, {0x1343148, 0xc0001013e0})
	/usr/local/Cellar/[email protected]/1.18.6/libexec/src/net/http/server.go:1966 +0x5d7
created by net/http.(*Server).Serve
	/usr/local/Cellar/[email protected]/1.18.6/libexec/src/net/http/server.go:3071 +0x4db
...

这个页面的结果展示了所有存活的goroutine的调用栈。

第一个段落goroutine 2126 [running]含义是：id为2126的这个goroutine，当前状态是running，往下是它的调用栈，从调用栈可以发现是pprof的功能，不用管，往下看第二个段落，goroutine 1 [IO wait]，id为1的这个goroutine状态为IO wait，从调用栈可以看出是http服务相关的，本段落最后一行block.go:19可以看出，是if err := http.ListenAndServe(":6060", nil); err != nil {这一行代码，这是需要阻塞在这里的，没毛病。

再接着往下看，goroutine 2130 [chan send, 14 minutes]，id为1的这个goroutine，在做chan send操作，已经执行了14 minutes，这显然是有问题的，从调用栈可以看到是在执行block.go:25，25行代码是queue <- "hello"，到此，完全定位到了问题。

通过对比两次执行pprof的差异来快速锁定问题

刚刚执行完benchmark之后执行如下命令：

$ Fetching profile over HTTP from http://127.0.0.1:6060/debug/pprof/goroutine
Saved profile in /Users/zhhnzw/pprof/pprof.goroutine.013.pb.gz
Type: goroutine
Time: Jan 24, 2023 at 8:34pm (CST)
Entering interactive mode (type "help" for commands, "o" for options)
(pprof) top
Showing nodes accounting for 1028, 99.90% of 1029 total
Dropped 22 nodes (cum <= 5)
Showing top 10 nodes out of 17
      flat  flat%   sum%        cum   cum%
      1028 99.90% 99.90%       1028 99.90%  runtime.gopark
         0     0% 99.90%       1024 99.51%  bufio.(*Reader).ReadLine
         0     0% 99.90%       1024 99.51%  bufio.(*Reader).ReadSlice
         0     0% 99.90%       1024 99.51%  bufio.(*Reader).fill
         0     0% 99.90%       1026 99.71%  internal/poll.(*FD).Read
         0     0% 99.90%       1027 99.81%  internal/poll.(*pollDesc).wait
         0     0% 99.90%       1027 99.81%  internal/poll.(*pollDesc).waitRead
         0     0% 99.90%       1027 99.81%  internal/poll.runtime_pollWait
         0     0% 99.90%       1026 99.71%  net.(*conn).Read
         0     0% 99.90%       1026 99.71%  net.(*netFD).Read
(pprof)

每次执行如上命令都会给出profile文件的路径，第二次执行加上-base参数指定上一次的profile文件，那么给出的分析结果就是2个pprof文件的差异。在第一次执行go tool pprof http://127.0.0.1:6060/debug/pprof/goroutine之后发起一个curl http://127.0.0.1:6060/push，再执行如下操作：

$ go tool pprof -base /Users/zhhnzw/pprof/pprof.goroutine.013.pb.gz http://127.0.0.1:6060/debug/pprof/goroutine
Fetching profile over HTTP from http://127.0.0.1:6060/debug/pprof/goroutine
Saved profile in /Users/zhhnzw/pprof/pprof.goroutine.014.pb.gz
Type: goroutine
Time: Jan 24, 2023 at 7:56pm (CST)
Entering interactive mode (type "help" for commands, "o" for options)
(pprof) top
Showing nodes accounting for 3, 0.15% of 2061 total
Showing top 10 nodes out of 24
      flat  flat%   sum%        cum   cum%
         3  0.15%  0.15%          3  0.15%  runtime.gopark
         0     0%  0.15%          1 0.049%  bufio.(*Reader).ReadLine
         0     0%  0.15%          1 0.049%  bufio.(*Reader).ReadSlice
         0     0%  0.15%          1 0.049%  bufio.(*Reader).fill
         0     0%  0.15%          2 0.097%  internal/poll.(*FD).Read
         0     0%  0.15%          2 0.097%  internal/poll.(*pollDesc).wait
         0     0%  0.15%          2 0.097%  internal/poll.(*pollDesc).waitRead (inline)
         0     0%  0.15%          2 0.097%  internal/poll.runtime_pollWait
         0     0%  0.15%          1 0.049%  main.pushData
         0     0%  0.15%          2 0.097%  net.(*conn).Read
(pprof) list main.pushData
Total: 2061
ROUTINE ======================== main.pushData in /Users/zhhnzw/workspace/mygithub/go_demos/interview/block/block.go
         0          1 (flat, cum) 0.049% of Total
         .          .     20:           log.Fatal(err)
         .          .     21:   }
         .          .     22:}
         .          .     23:
         .          .     24:func pushData(w http.ResponseWriter, r *http.Request) {
         .          1     25:   queue <- "hello"
         .          .     26:   w.WriteHeader(http.StatusOK)
         .          .     27:   if _, err := w.Write([]byte("success")); err != nil {
         .          .     28:           log.Println(err)
         .          .     29:   }
         .          .     30:}

从如上结果就可以看出，有1个goroutine阻塞在了第25行。

Previouspprof Nextdelve

Last updated 2 years ago

Was this helpful?