pprof进阶使用

火焰图

安装go-torch

uber 开源工具go-torch,可以直接读取 golang profiling 数据,并生成一个火焰图的 svg 文件。

go get -v github.com/uber/go-torch

火焰图 svg 文件可以通过浏览器打开,可以通过点击每个方块来 zoom in 检查它内部的内容。

火焰图的调用顺序从下到上,每个方块代表一个函数,它上面一层表示这个函数会调用哪些函数,方块的大小代表了占用 CPU 使用的长短。火焰图的配色并没有特殊的意义,默认的红、黄配色是为了更像火焰而已。

go-torch 工具的使用非常简单,没有任何参数的话,它会尝试从http://localhost:8080/debug/pprof/profile获取 profiling 数据。它有三个常用的参数可以调整:

  • -u –url:要访问的 URL,这里只是主机和端口部分

  • -s –suffix:pprof profile 的路径,默认为 /debug/pprof/profile

  • –seconds:要执行 profiling 的时间长度,默认为 30s

安装 FlameGraph

要生成火焰图,需要事先安装 FlameGraph工具,这个工具的安装很简单(需要perl环境支持),只要把对应的可执行文件加入到环境变量中即可。

  1. 下载安装perl:https://www.perl.org/get.html

  2. 下载FlameGraph:git clone https://github.com/brendangregg/FlameGraph.git

  3. FlameGraph目录加入到操作系统的环境变量中。

压测工具wrk

https://github.com/wg/wrk 或 https://github.com/adjust/go-wrk

火焰图与压测工具结合使用

以服务型应用为例,源码见

使用wrk对version接口进行压测:

go-wrk -n 50000 http://127.0.0.1:8080/version

在上面压测进行的同时,打开另一个终端执行:

go-torch -u http://127.0.0.1:8080 -t 30

30秒之后终端会出现如下提示:

INFO[18:01:01] Run pprof command: go tool pprof -raw -seconds 30 http://127.0.0.1:8080/debug/pprof/profile
INFO[18:01:31] Writing svg to torch.svg

然后使用浏览器打开torch.svg就能看到如下火焰图了:

火焰图

火焰图的y轴表示cpu调用方法的先后,x轴表示在每个采样调用时间内,方法所占的时间百分比,越宽代表占据cpu时间越多。通过火焰图就可以更清楚的找出耗时长的函数调用,然后不断的修正代码,重新采样,不断优化。

此外还可以借助火焰图分析内存性能数据:

go-torch -inuse_space http://127.0.0.1:8080/debug/pprof/heap
go-torch -inuse_objects http://127.0.0.1:8080/debug/pprof/heap
go-torch -alloc_space http://127.0.0.1:8080/debug/pprof/heap
go-torch -alloc_objects http://127.0.0.1:8080/debug/pprof/heap

函数优化案例

以优化基准测试的lengthOfNonRepeatingSubStr函数为例

  1. 执行go test -bench=Substr -v -cpuprofile=cpu.prof -benchmem在性能测试的同时取得 CPU profiling 数据文件。

  2. 执行go tool pprof -http=":8081" cpu.prof,访问http://localhost:8081

初始

通过上图可以看出,大部分时间都花在了map相关操作

  1. 知道了性能瓶颈所在之后,优化源代码,用[]int代替map

    func lengthOfNonRepeatingSubStr1(s string) int {
    	lastOccurred := make([]int, 0xffff)
    	start := 0
    	maxLength := 0
    
    	for i, ch := range []rune(s) {
    		if lastI := lastOccurred[ch]; lastI > start {
    			start = lastI
    		}
    		if i-start+1 > maxLength {
    			maxLength = i - start + 1
    		}
    		lastOccurred[ch] = i + 1
    	}
    
    	return maxLength
    }

    再执行测试命令:

    zhhnzw$ go test -bench=Substr -v -cpuprofile=cpu.prof -benchmem
    === RUN   TestSubstr
    --- PASS: TestSubstr (0.00s)
    goos: darwin
    goarch: amd64
    cpu: Intel(R) Core(TM) i7-7920HQ CPU @ 3.10GHz
    BenchmarkSubstr
        nonrepeating_test.go:41: len(s) = 491520
        nonrepeating_test.go:41: len(s) = 491520
        nonrepeating_test.go:41: len(s) = 491520
    BenchmarkSubstr-8            601           1968375 ns/op         1179663 B/op          2 allocs/op
    PASS

    可以看到,执行效率从5152279 ns/op进步到了1968375 ns/op,内存消耗从655590 B/op退步到了1179663 B/op,再执行go tool pprof -http=":8081" cpu.prof,访问http://localhost:8081

优化后

  1. 可以看到,现在CPU性能消耗只剩下了stringtoslicerune,涉及到的就是rune(s)操作,这里是需要一个utf-8的解码操作耗费了CPU计算时间,因为必须要支持中文,这个解码操作是省不了的,所以CPU的优化到此就完成了。

  2. 接下来再来看内存,观察到函数每次被调用每次都会创建一个slice,这一点可以优化,把lastOccurred变量提取到全局变量,在函数内每次都清掉它的值,这样这个slice变量就只用初始化一次了

    var lastOccurred = make([]int, 0xffff)
    func lengthOfNonRepeatingSubStr2(s string) int {
    
    	for i,l:=0,len(lastOccurred);i<l;i++ {
    		lastOccurred[i]=0
    	}
    	start := 0
    	maxLength := 0
    
    	chs := []rune(s)
      // 这也是个优化细节,用 i<len(chs) 会每次调用 len 函数,下面这样写,len 函数只会调用一次
    	for i,l:=0,len(chs);i<l;i++ {  
    		if lastI := lastOccurred[chs[i]]; lastI > start {
    			start = lastI
    		}
    		if i-start+1 > maxLength {
    			maxLength = i - start + 1
    		}
    		lastOccurred[chs[i]] = i + 1
    	}
      
    	return maxLength
    }

    再执行测试命令:

    zhhnzw$ go test -bench=Substr -v -cpuprofile=cpu.prof -benchmem
    === RUN   TestSubstr
    --- PASS: TestSubstr (0.00s)
    goos: darwin
    goarch: amd64
    cpu: Intel(R) Core(TM) i7-7920HQ CPU @ 3.10GHz
    BenchmarkSubstr
        nonrepeating_test.go:41: len(s) = 491520
        nonrepeating_test.go:41: len(s) = 491520
        nonrepeating_test.go:41: len(s) = 491520
    BenchmarkSubstr-8            620           1927372 ns/op          655362 B/op          1 allocs/op
    PASS

    可以看到,内存消耗进步到了655362 B/op,内存分配次数从2次优化到了1次,至此CPU和Memory都优化完毕了。

pprof与服务型应用

  1. 访问:http://127.0.0.1:8080/debug/pprof/

  2. 启动压力测试:go-wrk -n 50000 http://127.0.0.1:8080/version

  3. 点击 web 页面的 profile 按钮,收集信息,30 s 后会自动下载 profile 文件

  4. 拿到 profile 文件后:go tool pprof -http=":8081" profile

定位api接口阻塞案例

编写如下有问题的代码,/push接口通过queue推送数据,但没有消费者,在推送1024次之后就会陷入阻塞:

package main

import (
	"log"
	"net/http"
	_ "net/http/pprof"
	"runtime"
)

var queue chan string

func main() {
	runtime.SetBlockProfileRate(1)
	runtime.SetMutexProfileFraction(1)
	queue = make(chan string, 1024)
	//go consumer()
	http.HandleFunc("/push", pushData)
	http.HandleFunc("/test", test)
	if err := http.ListenAndServe(":6060", nil); err != nil {
		log.Fatal(err)
	}
}

func pushData(w http.ResponseWriter, r *http.Request) {
	queue <- "hello"
	w.WriteHeader(http.StatusOK)
	if _, err := w.Write([]byte("success")); err != nil {
		log.Println(err)
	}
}

func test(w http.ResponseWriter, r *http.Request) {
	w.WriteHeader(http.StatusOK)
	if _, err := w.Write([]byte("hello")); err != nil {
		log.Println(err)
	}
}

func consumer() {
	for data := range queue {
		log.Println(data)
	}
}

运行服务,执行如下基准测试后陷入阻塞:

func BenchmarkSendRequest(b *testing.B) {
	for i := 0; i < b.N; i++ {
		resp, err := http.Get("http://127.0.0.1:6060/push")
		b.Log(resp, err)
	}
}

打开http://127.0.0.1:6060/debug/pprof/goroutine?debug=1,看到如下结果:

注:最好休息几分钟再做这个操作,因为刚执行完基准测试,还有1000多个正常处理了请求的存活的goroutine未被回收,会对定位问题造成干扰,等几分钟之后,阻塞的goroutine就会浮出水面了。

goroutine profile: total 9
3 @ 0x10382b6 0x1006825 0x10063dd 0x125cff8 0x120a66f 0x120c009 0x120d31b 0x1209397 0x10678c1
#	0x125cff7	main.pushData+0x37			/Users/zhhnzw/workspace/mygithub/go_demos/interview/block/block.go:25
#	0x120a66e	net/http.HandlerFunc.ServeHTTP+0x2e	/usr/local/Cellar/[email protected]/1.18.6/libexec/src/net/http/server.go:2084
#	0x120c008	net/http.(*ServeMux).ServeHTTP+0x148	/usr/local/Cellar/[email protected]/1.18.6/libexec/src/net/http/server.go:2462
#	0x120d31a	net/http.serverHandler.ServeHTTP+0x43a	/usr/local/Cellar/[email protected]/1.18.6/libexec/src/net/http/server.go:2916
#	0x1209396	net/http.(*conn).serve+0x5d6		/usr/local/Cellar/[email protected]/1.18.6/libexec/src/net/http/server.go:1966

2 @ 0x10382b6 0x10319d7 0x1061de9 0x10a6a32 0x10a75da 0x10a75c8 0x115bf29 0x1166be5 0x12032ff 0x10678c1
#	0x1061de8	internal/poll.runtime_pollWait+0x88		/usr/local/Cellar/[email protected]/1.18.6/libexec/src/runtime/netpoll.go:302
#	0x10a6a31	internal/poll.(*pollDesc).wait+0x31		/usr/local/Cellar/[email protected]/1.18.6/libexec/src/internal/poll/fd_poll_runtime.go:83
#	0x10a75d9	internal/poll.(*pollDesc).waitRead+0x259	/usr/local/Cellar/[email protected]/1.18.6/libexec/src/internal/poll/fd_poll_runtime.go:88
...

通过第一行,可以看到程序现存9goroutine,第二行开头数字3,表示有3goroutine的调用栈是如下段落的展示结果,再下一个锻炼开头数字2,表示接下来这个段落的调用栈有2goroutine在执行。

通过这一步已经可以发现问题了,block.go:25,含义是有3goroutine都执行到了block.go的第25行,意味着这里多半是发生了阻塞。但接下来还是进一步定位问题:

再打开http://127.0.0.1:6060/debug/pprof/goroutine?debug=2,看到如下结果:

goroutine 2126 [running]:
runtime/pprof.writeGoroutineStacks({0x1341040, 0xc000292000})
	/usr/local/Cellar/[email protected]/1.18.6/libexec/src/runtime/pprof/pprof.go:694 +0x70
runtime/pprof.writeGoroutine({0x1341040?, 0xc000292000?}, 0x1015d2f?)
	/usr/local/Cellar/[email protected]/1.18.6/libexec/src/runtime/pprof/pprof.go:683 +0x2b
runtime/pprof.(*Profile).WriteTo(0x12a7a60?, {0x1341040?, 0xc000292000?}, 0xc?)
	/usr/local/Cellar/[email protected]/1.18.6/libexec/src/runtime/pprof/pprof.go:332 +0x14b
net/http/pprof.handler.ServeHTTP({0xc000020521, 0x9}, {0x1342e90, 0xc000292000}, 0x2e302e302e373231?)
	/usr/local/Cellar/[email protected]/1.18.6/libexec/src/net/http/pprof/pprof.go:253 +0x4a5
net/http/pprof.Index({0x1342e90?, 0xc000292000}, 0xc001170100)
	/usr/local/Cellar/[email protected]/1.18.6/libexec/src/net/http/pprof/pprof.go:371 +0x13e
net/http.HandlerFunc.ServeHTTP(0x9d849a8?, {0x1342e90?, 0xc000292000?}, 0x100e165?)
	/usr/local/Cellar/[email protected]/1.18.6/libexec/src/net/http/server.go:2084 +0x2f
net/http.(*ServeMux).ServeHTTP(0xc00002052b?, {0x1342e90, 0xc000292000}, 0xc001170100)
	/usr/local/Cellar/[email protected]/1.18.6/libexec/src/net/http/server.go:2462 +0x149
net/http.serverHandler.ServeHTTP({0xc000e04210?}, {0x1342e90, 0xc000292000}, 0xc001170100)
	/usr/local/Cellar/[email protected]/1.18.6/libexec/src/net/http/server.go:2916 +0x43b
net/http.(*conn).serve(0xc000110a00, {0x1343148, 0xc0001013e0})
	/usr/local/Cellar/[email protected]/1.18.6/libexec/src/net/http/server.go:1966 +0x5d7
created by net/http.(*Server).Serve
	/usr/local/Cellar/[email protected]/1.18.6/libexec/src/net/http/server.go:3071 +0x4db

goroutine 1 [IO wait]:
internal/poll.runtime_pollWait(0x9d84a98, 0x72)
	/usr/local/Cellar/[email protected]/1.18.6/libexec/src/runtime/netpoll.go:302 +0x89
internal/poll.(*pollDesc).wait(0xc000164600?, 0x4?, 0x0)
	/usr/local/Cellar/[email protected]/1.18.6/libexec/src/internal/poll/fd_poll_runtime.go:83 +0x32
internal/poll.(*pollDesc).waitRead(...)
	/usr/local/Cellar/[email protected]/1.18.6/libexec/src/internal/poll/fd_poll_runtime.go:88
internal/poll.(*FD).Accept(0xc000164600)
	/usr/local/Cellar/[email protected]/1.18.6/libexec/src/internal/poll/fd_unix.go:614 +0x22c
net.(*netFD).accept(0xc000164600)
	/usr/local/Cellar/[email protected]/1.18.6/libexec/src/net/fd_unix.go:172 +0x35
net.(*TCPListener).accept(0xc00000e210)
	/usr/local/Cellar/[email protected]/1.18.6/libexec/src/net/tcpsock_posix.go:139 +0x28
net.(*TCPListener).Accept(0xc00000e210)
	/usr/local/Cellar/[email protected]/1.18.6/libexec/src/net/tcpsock.go:288 +0x3d
net/http.(*Server).Serve(0xc00017e000, {0x1342ce0, 0xc00000e210})
	/usr/local/Cellar/[email protected]/1.18.6/libexec/src/net/http/server.go:3039 +0x385
net/http.(*Server).ListenAndServe(0xc00017e000)
	/usr/local/Cellar/[email protected]/1.18.6/libexec/src/net/http/server.go:2968 +0x7d
net/http.ListenAndServe(...)
	/usr/local/Cellar/[email protected]/1.18.6/libexec/src/net/http/server.go:3222
main.main()
	/Users/zhhnzw/workspace/mygithub/go_demos/interview/block/block.go:19 +0xdb

goroutine 2074 [chan send, 14 minutes]:
main.pushData({0x1342e90, 0xc0010dc0e0}, 0x0?)
	/Users/zhhnzw/workspace/mygithub/go_demos/interview/block/block.go:25 +0x38
net/http.HandlerFunc.ServeHTTP(0x9df1c40?, {0x1342e90?, 0xc0010dc0e0?}, 0x100e165?)
	/usr/local/Cellar/[email protected]/1.18.6/libexec/src/net/http/server.go:2084 +0x2f
net/http.(*ServeMux).ServeHTTP(0x0?, {0x1342e90, 0xc0010dc0e0}, 0xc001170400)
	/usr/local/Cellar/[email protected]/1.18.6/libexec/src/net/http/server.go:2462 +0x149
net/http.serverHandler.ServeHTTP({0xc001158a20?}, {0x1342e90, 0xc0010dc0e0}, 0xc001170400)
	/usr/local/Cellar/[email protected]/1.18.6/libexec/src/net/http/server.go:2916 +0x43b
net/http.(*conn).serve(0xc001117180, {0x1343148, 0xc0001013e0})
	/usr/local/Cellar/[email protected]/1.18.6/libexec/src/net/http/server.go:1966 +0x5d7
created by net/http.(*Server).Serve
	/usr/local/Cellar/[email protected]/1.18.6/libexec/src/net/http/server.go:3071 +0x4db

goroutine 2104 [IO wait]:
internal/poll.runtime_pollWait(0x9d849a8, 0x72)
	/usr/local/Cellar/[email protected]/1.18.6/libexec/src/runtime/netpoll.go:302 +0x89
internal/poll.(*pollDesc).wait(0xc00167e180?, 0xc000e04221?, 0x0)
	/usr/local/Cellar/[email protected]/1.18.6/libexec/src/internal/poll/fd_poll_runtime.go:83 +0x32
internal/poll.(*pollDesc).waitRead(...)
	/usr/local/Cellar/[email protected]/1.18.6/libexec/src/internal/poll/fd_poll_runtime.go:88
internal/poll.(*FD).Read(0xc00167e180, {0xc000e04221, 0x1, 0x1})
	/usr/local/Cellar/[email protected]/1.18.6/libexec/src/internal/poll/fd_unix.go:167 +0x25a
net.(*netFD).Read(0xc00167e180, {0xc000e04221?, 0x0?, 0x0?})
	/usr/local/Cellar/[email protected]/1.18.6/libexec/src/net/fd_posix.go:55 +0x29
net.(*conn).Read(0xc0000ba020, {0xc000e04221?, 0x0?, 0x0?})
	/usr/local/Cellar/[email protected]/1.18.6/libexec/src/net/net.go:183 +0x45
net/http.(*connReader).backgroundRead(0xc000e04210)
	/usr/local/Cellar/[email protected]/1.18.6/libexec/src/net/http/server.go:672 +0x3f
created by net/http.(*connReader).startBackgroundRead
	/usr/local/Cellar/[email protected]/1.18.6/libexec/src/net/http/server.go:668 +0xca

goroutine 2130 [chan send, 14 minutes]:
main.pushData({0x1342e90, 0xc0002920e0}, 0x0?)
	/Users/zhhnzw/workspace/mygithub/go_demos/interview/block/block.go:25 +0x38
net/http.HandlerFunc.ServeHTTP(0x9d846d8?, {0x1342e90?, 0xc0002920e0?}, 0x100e165?)
	/usr/local/Cellar/[email protected]/1.18.6/libexec/src/net/http/server.go:2084 +0x2f
net/http.(*ServeMux).ServeHTTP(0x0?, {0x1342e90, 0xc0002920e0}, 0xc000973300)
	/usr/local/Cellar/[email protected]/1.18.6/libexec/src/net/http/server.go:2462 +0x149
net/http.serverHandler.ServeHTTP({0xc000b63650?}, {0x1342e90, 0xc0002920e0}, 0xc000973300)
	/usr/local/Cellar/[email protected]/1.18.6/libexec/src/net/http/server.go:2916 +0x43b
net/http.(*conn).serve(0xc000111b80, {0x1343148, 0xc0001013e0})
	/usr/local/Cellar/[email protected]/1.18.6/libexec/src/net/http/server.go:1966 +0x5d7
created by net/http.(*Server).Serve
	/usr/local/Cellar/[email protected]/1.18.6/libexec/src/net/http/server.go:3071 +0x4db
...

这个页面的结果展示了所有存活的goroutine的调用栈。

第一个段落goroutine 2126 [running]含义是:id2126的这个goroutine,当前状态是running,往下是它的调用栈,从调用栈可以发现是pprof的功能,不用管,往下看第二个段落,goroutine 1 [IO wait]id1的这个goroutine状态为IO wait,从调用栈可以看出是http服务相关的,本段落最后一行block.go:19可以看出,是if err := http.ListenAndServe(":6060", nil); err != nil {这一行代码,这是需要阻塞在这里的,没毛病。

再接着往下看,goroutine 2130 [chan send, 14 minutes]id1的这个goroutine,在做chan send操作,已经执行了14 minutes,这显然是有问题的,从调用栈可以看到是在执行block.go:25,25行代码是queue <- "hello",到此,完全定位到了问题。

通过对比两次执行pprof的差异来快速锁定问题

刚刚执行完benchmark之后执行如下命令:

$ Fetching profile over HTTP from http://127.0.0.1:6060/debug/pprof/goroutine
Saved profile in /Users/zhhnzw/pprof/pprof.goroutine.013.pb.gz
Type: goroutine
Time: Jan 24, 2023 at 8:34pm (CST)
Entering interactive mode (type "help" for commands, "o" for options)
(pprof) top
Showing nodes accounting for 1028, 99.90% of 1029 total
Dropped 22 nodes (cum <= 5)
Showing top 10 nodes out of 17
      flat  flat%   sum%        cum   cum%
      1028 99.90% 99.90%       1028 99.90%  runtime.gopark
         0     0% 99.90%       1024 99.51%  bufio.(*Reader).ReadLine
         0     0% 99.90%       1024 99.51%  bufio.(*Reader).ReadSlice
         0     0% 99.90%       1024 99.51%  bufio.(*Reader).fill
         0     0% 99.90%       1026 99.71%  internal/poll.(*FD).Read
         0     0% 99.90%       1027 99.81%  internal/poll.(*pollDesc).wait
         0     0% 99.90%       1027 99.81%  internal/poll.(*pollDesc).waitRead
         0     0% 99.90%       1027 99.81%  internal/poll.runtime_pollWait
         0     0% 99.90%       1026 99.71%  net.(*conn).Read
         0     0% 99.90%       1026 99.71%  net.(*netFD).Read
(pprof)  

每次执行如上命令都会给出profile文件的路径,第二次执行加上-base参数指定上一次的profile文件,那么给出的分析结果就是2个pprof文件的差异。 在第一次执行go tool pprof http://127.0.0.1:6060/debug/pprof/goroutine之后发起一个curl http://127.0.0.1:6060/push,再执行如下操作:

$ go tool pprof -base /Users/zhhnzw/pprof/pprof.goroutine.013.pb.gz http://127.0.0.1:6060/debug/pprof/goroutine
Fetching profile over HTTP from http://127.0.0.1:6060/debug/pprof/goroutine
Saved profile in /Users/zhhnzw/pprof/pprof.goroutine.014.pb.gz
Type: goroutine
Time: Jan 24, 2023 at 7:56pm (CST)
Entering interactive mode (type "help" for commands, "o" for options)
(pprof) top
Showing nodes accounting for 3, 0.15% of 2061 total
Showing top 10 nodes out of 24
      flat  flat%   sum%        cum   cum%
         3  0.15%  0.15%          3  0.15%  runtime.gopark
         0     0%  0.15%          1 0.049%  bufio.(*Reader).ReadLine
         0     0%  0.15%          1 0.049%  bufio.(*Reader).ReadSlice
         0     0%  0.15%          1 0.049%  bufio.(*Reader).fill
         0     0%  0.15%          2 0.097%  internal/poll.(*FD).Read
         0     0%  0.15%          2 0.097%  internal/poll.(*pollDesc).wait
         0     0%  0.15%          2 0.097%  internal/poll.(*pollDesc).waitRead (inline)
         0     0%  0.15%          2 0.097%  internal/poll.runtime_pollWait
         0     0%  0.15%          1 0.049%  main.pushData
         0     0%  0.15%          2 0.097%  net.(*conn).Read
(pprof) list main.pushData
Total: 2061
ROUTINE ======================== main.pushData in /Users/zhhnzw/workspace/mygithub/go_demos/interview/block/block.go
         0          1 (flat, cum) 0.049% of Total
         .          .     20:           log.Fatal(err)
         .          .     21:   }
         .          .     22:}
         .          .     23:
         .          .     24:func pushData(w http.ResponseWriter, r *http.Request) {
         .          1     25:   queue <- "hello"
         .          .     26:   w.WriteHeader(http.StatusOK)
         .          .     27:   if _, err := w.Write([]byte("success")); err != nil {
         .          .     28:           log.Println(err)
         .          .     29:   }
         .          .     30:}

从如上结果就可以看出,有1个goroutine阻塞在了第25行。

Last updated

Was this helpful?