火焰图
安装go-torch
uber 开源工具go-torch
,可以直接读取 golang profiling 数据,并生成一个火焰图的 svg 文件。
go get -v github.com/uber/go-torch
火焰图 svg 文件可以通过浏览器打开,可以通过点击每个方块来 zoom in 检查它内部的内容。
火焰图的调用顺序从下到上,每个方块代表一个函数,它上面一层表示这个函数会调用哪些函数,方块的大小代表了占用 CPU 使用的长短。火焰图的配色并没有特殊的意义,默认的红、黄配色是为了更像火焰而已。
go-torch 工具的使用非常简单,没有任何参数的话,它会尝试从http://localhost:8080/debug/pprof/profile
获取 profiling 数据。它有三个常用的参数可以调整:
-u –url:要访问的 URL,这里只是主机和端口部分
-s –suffix:pprof profile 的路径,默认为 /debug/pprof/profile
–seconds:要执行 profiling 的时间长度,默认为 30s
安装 FlameGraph
要生成火焰图,需要事先安装 FlameGraph工具,这个工具的安装很简单(需要perl环境支持),只要把对应的可执行文件加入到环境变量中即可。
下载安装perl:https://www.perl.org/get.html
下载FlameGraph:git clone https://github.com/brendangregg/FlameGraph.git
将FlameGraph
目录加入到操作系统的环境变量中。
压测工具wrk
https://github.com/wg/wrk 或 https://github.com/adjust/go-wrk
火焰图与压测工具结合使用
使用wrk对version接口进行压测:
go-wrk -n 50000 http://127.0.0.1:8080/version
在上面压测进行的同时,打开另一个终端执行:
go-torch -u http://127.0.0.1:8080 -t 30
30秒之后终端会出现如下提示:
INFO[18:01:01] Run pprof command: go tool pprof -raw -seconds 30 http://127.0.0.1:8080/debug/pprof/profile
INFO[18:01:31] Writing svg to torch.svg
然后使用浏览器打开torch.svg
就能看到如下火焰图了:
火焰图的y轴表示cpu调用方法的先后,x轴表示在每个采样调用时间内,方法所占的时间百分比,越宽代表占据cpu时间越多。通过火焰图就可以更清楚的找出耗时长的函数调用,然后不断的修正代码,重新采样,不断优化。
此外还可以借助火焰图分析内存性能数据:
go-torch -inuse_space http://127.0.0.1:8080/debug/pprof/heap
go-torch -inuse_objects http://127.0.0.1:8080/debug/pprof/heap
go-torch -alloc_space http://127.0.0.1:8080/debug/pprof/heap
go-torch -alloc_objects http://127.0.0.1:8080/debug/pprof/heap
函数优化案例
执行go test -bench=Substr -v -cpuprofile=cpu.prof -benchmem
在性能测试的同时取得 CPU profiling 数据文件。
执行go tool pprof -http=":8081" cpu.prof
,访问http://localhost:8081
通过上图可以看出,大部分时间都花在了map相关操作
知道了性能瓶颈所在之后,优化源代码,用[]int
代替map
func lengthOfNonRepeatingSubStr1(s string) int {
lastOccurred := make([]int, 0xffff)
start := 0
maxLength := 0
for i, ch := range []rune(s) {
if lastI := lastOccurred[ch]; lastI > start {
start = lastI
}
if i-start+1 > maxLength {
maxLength = i - start + 1
}
lastOccurred[ch] = i + 1
}
return maxLength
}
再执行测试命令:
zhhnzw$ go test -bench=Substr -v -cpuprofile=cpu.prof -benchmem
=== RUN TestSubstr
--- PASS: TestSubstr (0.00s)
goos: darwin
goarch: amd64
cpu: Intel(R) Core(TM) i7-7920HQ CPU @ 3.10GHz
BenchmarkSubstr
nonrepeating_test.go:41: len(s) = 491520
nonrepeating_test.go:41: len(s) = 491520
nonrepeating_test.go:41: len(s) = 491520
BenchmarkSubstr-8 601 1968375 ns/op 1179663 B/op 2 allocs/op
PASS
可以看到,执行效率从5152279 ns/op
进步到了1968375 ns/op
,内存消耗从655590 B/op
退步到了1179663 B/op
,再执行go tool pprof -http=":8081" cpu.prof
,访问http://localhost:8081
可以看到,现在CPU性能消耗只剩下了stringtoslicerune
,涉及到的就是rune(s)
操作,这里是需要一个utf-8
的解码操作耗费了CPU计算时间,因为必须要支持中文,这个解码操作是省不了的,所以CPU的优化到此就完成了。
接下来再来看内存,观察到函数每次被调用每次都会创建一个slice
,这一点可以优化,把lastOccurred
变量提取到全局变量,在函数内每次都清掉它的值,这样这个slice
变量就只用初始化一次了
var lastOccurred = make([]int, 0xffff)
func lengthOfNonRepeatingSubStr2(s string) int {
for i,l:=0,len(lastOccurred);i<l;i++ {
lastOccurred[i]=0
}
start := 0
maxLength := 0
chs := []rune(s)
// 这也是个优化细节,用 i<len(chs) 会每次调用 len 函数,下面这样写,len 函数只会调用一次
for i,l:=0,len(chs);i<l;i++ {
if lastI := lastOccurred[chs[i]]; lastI > start {
start = lastI
}
if i-start+1 > maxLength {
maxLength = i - start + 1
}
lastOccurred[chs[i]] = i + 1
}
return maxLength
}
再执行测试命令:
zhhnzw$ go test -bench=Substr -v -cpuprofile=cpu.prof -benchmem
=== RUN TestSubstr
--- PASS: TestSubstr (0.00s)
goos: darwin
goarch: amd64
cpu: Intel(R) Core(TM) i7-7920HQ CPU @ 3.10GHz
BenchmarkSubstr
nonrepeating_test.go:41: len(s) = 491520
nonrepeating_test.go:41: len(s) = 491520
nonrepeating_test.go:41: len(s) = 491520
BenchmarkSubstr-8 620 1927372 ns/op 655362 B/op 1 allocs/op
PASS
可以看到,内存消耗进步到了655362 B/op
,内存分配次数从2次优化到了1次,至此CPU和Memory都优化完毕了。
pprof与服务型应用
访问:http://127.0.0.1:8080/debug/pprof/
启动压力测试:go-wrk -n 50000 http://127.0.0.1:8080/version
点击 web 页面的 profile 按钮,收集信息,30 s 后会自动下载 profile 文件
拿到 profile 文件后:go tool pprof -http=":8081" profile
定位api接口阻塞案例
编写如下有问题的代码,/push
接口通过queue
推送数据,但没有消费者,在推送1024
次之后就会陷入阻塞:
package main
import (
"log"
"net/http"
_ "net/http/pprof"
"runtime"
)
var queue chan string
func main() {
runtime.SetBlockProfileRate(1)
runtime.SetMutexProfileFraction(1)
queue = make(chan string, 1024)
//go consumer()
http.HandleFunc("/push", pushData)
http.HandleFunc("/test", test)
if err := http.ListenAndServe(":6060", nil); err != nil {
log.Fatal(err)
}
}
func pushData(w http.ResponseWriter, r *http.Request) {
queue <- "hello"
w.WriteHeader(http.StatusOK)
if _, err := w.Write([]byte("success")); err != nil {
log.Println(err)
}
}
func test(w http.ResponseWriter, r *http.Request) {
w.WriteHeader(http.StatusOK)
if _, err := w.Write([]byte("hello")); err != nil {
log.Println(err)
}
}
func consumer() {
for data := range queue {
log.Println(data)
}
}
运行服务,执行如下基准测试后陷入阻塞:
func BenchmarkSendRequest(b *testing.B) {
for i := 0; i < b.N; i++ {
resp, err := http.Get("http://127.0.0.1:6060/push")
b.Log(resp, err)
}
}
打开http://127.0.0.1:6060/debug/pprof/goroutine?debug=1
,看到如下结果:
注:最好休息几分钟再做这个操作,因为刚执行完基准测试,还有1000
多个正常处理了请求的存活的goroutine
未被回收,会对定位问题造成干扰,等几分钟之后,阻塞的goroutine
就会浮出水面了。
goroutine profile: total 9
3 @ 0x10382b6 0x1006825 0x10063dd 0x125cff8 0x120a66f 0x120c009 0x120d31b 0x1209397 0x10678c1
# 0x125cff7 main.pushData+0x37 /Users/zhhnzw/workspace/mygithub/go_demos/interview/block/block.go:25
# 0x120a66e net/http.HandlerFunc.ServeHTTP+0x2e /usr/local/Cellar/go@1.18/1.18.6/libexec/src/net/http/server.go:2084
# 0x120c008 net/http.(*ServeMux).ServeHTTP+0x148 /usr/local/Cellar/go@1.18/1.18.6/libexec/src/net/http/server.go:2462
# 0x120d31a net/http.serverHandler.ServeHTTP+0x43a /usr/local/Cellar/go@1.18/1.18.6/libexec/src/net/http/server.go:2916
# 0x1209396 net/http.(*conn).serve+0x5d6 /usr/local/Cellar/go@1.18/1.18.6/libexec/src/net/http/server.go:1966
2 @ 0x10382b6 0x10319d7 0x1061de9 0x10a6a32 0x10a75da 0x10a75c8 0x115bf29 0x1166be5 0x12032ff 0x10678c1
# 0x1061de8 internal/poll.runtime_pollWait+0x88 /usr/local/Cellar/go@1.18/1.18.6/libexec/src/runtime/netpoll.go:302
# 0x10a6a31 internal/poll.(*pollDesc).wait+0x31 /usr/local/Cellar/go@1.18/1.18.6/libexec/src/internal/poll/fd_poll_runtime.go:83
# 0x10a75d9 internal/poll.(*pollDesc).waitRead+0x259 /usr/local/Cellar/go@1.18/1.18.6/libexec/src/internal/poll/fd_poll_runtime.go:88
...
通过第一行,可以看到程序现存9
个goroutine
,第二行开头数字3
,表示有3
个goroutine
的调用栈是如下段落的展示结果,再下一个锻炼开头数字2
,表示接下来这个段落的调用栈有2
个goroutine
在执行。
通过这一步已经可以发现问题了,block.go:25
,含义是有3
个goroutine
都执行到了block.go
的第25
行,意味着这里多半是发生了阻塞。但接下来还是进一步定位问题:
再打开http://127.0.0.1:6060/debug/pprof/goroutine?debug=2
,看到如下结果:
goroutine 2126 [running]:
runtime/pprof.writeGoroutineStacks({0x1341040, 0xc000292000})
/usr/local/Cellar/go@1.18/1.18.6/libexec/src/runtime/pprof/pprof.go:694 +0x70
runtime/pprof.writeGoroutine({0x1341040?, 0xc000292000?}, 0x1015d2f?)
/usr/local/Cellar/go@1.18/1.18.6/libexec/src/runtime/pprof/pprof.go:683 +0x2b
runtime/pprof.(*Profile).WriteTo(0x12a7a60?, {0x1341040?, 0xc000292000?}, 0xc?)
/usr/local/Cellar/go@1.18/1.18.6/libexec/src/runtime/pprof/pprof.go:332 +0x14b
net/http/pprof.handler.ServeHTTP({0xc000020521, 0x9}, {0x1342e90, 0xc000292000}, 0x2e302e302e373231?)
/usr/local/Cellar/go@1.18/1.18.6/libexec/src/net/http/pprof/pprof.go:253 +0x4a5
net/http/pprof.Index({0x1342e90?, 0xc000292000}, 0xc001170100)
/usr/local/Cellar/go@1.18/1.18.6/libexec/src/net/http/pprof/pprof.go:371 +0x13e
net/http.HandlerFunc.ServeHTTP(0x9d849a8?, {0x1342e90?, 0xc000292000?}, 0x100e165?)
/usr/local/Cellar/go@1.18/1.18.6/libexec/src/net/http/server.go:2084 +0x2f
net/http.(*ServeMux).ServeHTTP(0xc00002052b?, {0x1342e90, 0xc000292000}, 0xc001170100)
/usr/local/Cellar/go@1.18/1.18.6/libexec/src/net/http/server.go:2462 +0x149
net/http.serverHandler.ServeHTTP({0xc000e04210?}, {0x1342e90, 0xc000292000}, 0xc001170100)
/usr/local/Cellar/go@1.18/1.18.6/libexec/src/net/http/server.go:2916 +0x43b
net/http.(*conn).serve(0xc000110a00, {0x1343148, 0xc0001013e0})
/usr/local/Cellar/go@1.18/1.18.6/libexec/src/net/http/server.go:1966 +0x5d7
created by net/http.(*Server).Serve
/usr/local/Cellar/go@1.18/1.18.6/libexec/src/net/http/server.go:3071 +0x4db
goroutine 1 [IO wait]:
internal/poll.runtime_pollWait(0x9d84a98, 0x72)
/usr/local/Cellar/go@1.18/1.18.6/libexec/src/runtime/netpoll.go:302 +0x89
internal/poll.(*pollDesc).wait(0xc000164600?, 0x4?, 0x0)
/usr/local/Cellar/go@1.18/1.18.6/libexec/src/internal/poll/fd_poll_runtime.go:83 +0x32
internal/poll.(*pollDesc).waitRead(...)
/usr/local/Cellar/go@1.18/1.18.6/libexec/src/internal/poll/fd_poll_runtime.go:88
internal/poll.(*FD).Accept(0xc000164600)
/usr/local/Cellar/go@1.18/1.18.6/libexec/src/internal/poll/fd_unix.go:614 +0x22c
net.(*netFD).accept(0xc000164600)
/usr/local/Cellar/go@1.18/1.18.6/libexec/src/net/fd_unix.go:172 +0x35
net.(*TCPListener).accept(0xc00000e210)
/usr/local/Cellar/go@1.18/1.18.6/libexec/src/net/tcpsock_posix.go:139 +0x28
net.(*TCPListener).Accept(0xc00000e210)
/usr/local/Cellar/go@1.18/1.18.6/libexec/src/net/tcpsock.go:288 +0x3d
net/http.(*Server).Serve(0xc00017e000, {0x1342ce0, 0xc00000e210})
/usr/local/Cellar/go@1.18/1.18.6/libexec/src/net/http/server.go:3039 +0x385
net/http.(*Server).ListenAndServe(0xc00017e000)
/usr/local/Cellar/go@1.18/1.18.6/libexec/src/net/http/server.go:2968 +0x7d
net/http.ListenAndServe(...)
/usr/local/Cellar/go@1.18/1.18.6/libexec/src/net/http/server.go:3222
main.main()
/Users/zhhnzw/workspace/mygithub/go_demos/interview/block/block.go:19 +0xdb
goroutine 2074 [chan send, 14 minutes]:
main.pushData({0x1342e90, 0xc0010dc0e0}, 0x0?)
/Users/zhhnzw/workspace/mygithub/go_demos/interview/block/block.go:25 +0x38
net/http.HandlerFunc.ServeHTTP(0x9df1c40?, {0x1342e90?, 0xc0010dc0e0?}, 0x100e165?)
/usr/local/Cellar/go@1.18/1.18.6/libexec/src/net/http/server.go:2084 +0x2f
net/http.(*ServeMux).ServeHTTP(0x0?, {0x1342e90, 0xc0010dc0e0}, 0xc001170400)
/usr/local/Cellar/go@1.18/1.18.6/libexec/src/net/http/server.go:2462 +0x149
net/http.serverHandler.ServeHTTP({0xc001158a20?}, {0x1342e90, 0xc0010dc0e0}, 0xc001170400)
/usr/local/Cellar/go@1.18/1.18.6/libexec/src/net/http/server.go:2916 +0x43b
net/http.(*conn).serve(0xc001117180, {0x1343148, 0xc0001013e0})
/usr/local/Cellar/go@1.18/1.18.6/libexec/src/net/http/server.go:1966 +0x5d7
created by net/http.(*Server).Serve
/usr/local/Cellar/go@1.18/1.18.6/libexec/src/net/http/server.go:3071 +0x4db
goroutine 2104 [IO wait]:
internal/poll.runtime_pollWait(0x9d849a8, 0x72)
/usr/local/Cellar/go@1.18/1.18.6/libexec/src/runtime/netpoll.go:302 +0x89
internal/poll.(*pollDesc).wait(0xc00167e180?, 0xc000e04221?, 0x0)
/usr/local/Cellar/go@1.18/1.18.6/libexec/src/internal/poll/fd_poll_runtime.go:83 +0x32
internal/poll.(*pollDesc).waitRead(...)
/usr/local/Cellar/go@1.18/1.18.6/libexec/src/internal/poll/fd_poll_runtime.go:88
internal/poll.(*FD).Read(0xc00167e180, {0xc000e04221, 0x1, 0x1})
/usr/local/Cellar/go@1.18/1.18.6/libexec/src/internal/poll/fd_unix.go:167 +0x25a
net.(*netFD).Read(0xc00167e180, {0xc000e04221?, 0x0?, 0x0?})
/usr/local/Cellar/go@1.18/1.18.6/libexec/src/net/fd_posix.go:55 +0x29
net.(*conn).Read(0xc0000ba020, {0xc000e04221?, 0x0?, 0x0?})
/usr/local/Cellar/go@1.18/1.18.6/libexec/src/net/net.go:183 +0x45
net/http.(*connReader).backgroundRead(0xc000e04210)
/usr/local/Cellar/go@1.18/1.18.6/libexec/src/net/http/server.go:672 +0x3f
created by net/http.(*connReader).startBackgroundRead
/usr/local/Cellar/go@1.18/1.18.6/libexec/src/net/http/server.go:668 +0xca
goroutine 2130 [chan send, 14 minutes]:
main.pushData({0x1342e90, 0xc0002920e0}, 0x0?)
/Users/zhhnzw/workspace/mygithub/go_demos/interview/block/block.go:25 +0x38
net/http.HandlerFunc.ServeHTTP(0x9d846d8?, {0x1342e90?, 0xc0002920e0?}, 0x100e165?)
/usr/local/Cellar/go@1.18/1.18.6/libexec/src/net/http/server.go:2084 +0x2f
net/http.(*ServeMux).ServeHTTP(0x0?, {0x1342e90, 0xc0002920e0}, 0xc000973300)
/usr/local/Cellar/go@1.18/1.18.6/libexec/src/net/http/server.go:2462 +0x149
net/http.serverHandler.ServeHTTP({0xc000b63650?}, {0x1342e90, 0xc0002920e0}, 0xc000973300)
/usr/local/Cellar/go@1.18/1.18.6/libexec/src/net/http/server.go:2916 +0x43b
net/http.(*conn).serve(0xc000111b80, {0x1343148, 0xc0001013e0})
/usr/local/Cellar/go@1.18/1.18.6/libexec/src/net/http/server.go:1966 +0x5d7
created by net/http.(*Server).Serve
/usr/local/Cellar/go@1.18/1.18.6/libexec/src/net/http/server.go:3071 +0x4db
...
这个页面的结果展示了所有存活的goroutine
的调用栈。
第一个段落goroutine 2126 [running]
含义是:id
为2126
的这个goroutine
,当前状态是running
,往下是它的调用栈,从调用栈可以发现是pprof
的功能,不用管,往下看第二个段落,goroutine 1 [IO wait]
,id
为1
的这个goroutine
状态为IO wait
,从调用栈可以看出是http
服务相关的,本段落最后一行block.go:19
可以看出,是if err := http.ListenAndServe(":6060", nil); err != nil {
这一行代码,这是需要阻塞在这里的,没毛病。
再接着往下看,goroutine 2130 [chan send, 14 minutes]
,id
为1
的这个goroutine
,在做chan send
操作,已经执行了14 minutes
,这显然是有问题的,从调用栈可以看到是在执行block.go:25
,25行代码是queue <- "hello"
,到此,完全定位到了问题。
通过对比两次执行pprof的差异来快速锁定问题
刚刚执行完benchmark
之后执行如下命令:
$ Fetching profile over HTTP from http://127.0.0.1:6060/debug/pprof/goroutine
Saved profile in /Users/zhhnzw/pprof/pprof.goroutine.013.pb.gz
Type: goroutine
Time: Jan 24, 2023 at 8:34pm (CST)
Entering interactive mode (type "help" for commands, "o" for options)
(pprof) top
Showing nodes accounting for 1028, 99.90% of 1029 total
Dropped 22 nodes (cum <= 5)
Showing top 10 nodes out of 17
flat flat% sum% cum cum%
1028 99.90% 99.90% 1028 99.90% runtime.gopark
0 0% 99.90% 1024 99.51% bufio.(*Reader).ReadLine
0 0% 99.90% 1024 99.51% bufio.(*Reader).ReadSlice
0 0% 99.90% 1024 99.51% bufio.(*Reader).fill
0 0% 99.90% 1026 99.71% internal/poll.(*FD).Read
0 0% 99.90% 1027 99.81% internal/poll.(*pollDesc).wait
0 0% 99.90% 1027 99.81% internal/poll.(*pollDesc).waitRead
0 0% 99.90% 1027 99.81% internal/poll.runtime_pollWait
0 0% 99.90% 1026 99.71% net.(*conn).Read
0 0% 99.90% 1026 99.71% net.(*netFD).Read
(pprof)
每次执行如上命令都会给出profile
文件的路径,第二次执行加上-base
参数指定上一次的profile
文件,那么给出的分析结果就是2个pprof
文件的差异。 在第一次执行go tool pprof http://127.0.0.1:6060/debug/pprof/goroutine
之后发起一个curl http://127.0.0.1:6060/push
,再执行如下操作:
$ go tool pprof -base /Users/zhhnzw/pprof/pprof.goroutine.013.pb.gz http://127.0.0.1:6060/debug/pprof/goroutine
Fetching profile over HTTP from http://127.0.0.1:6060/debug/pprof/goroutine
Saved profile in /Users/zhhnzw/pprof/pprof.goroutine.014.pb.gz
Type: goroutine
Time: Jan 24, 2023 at 7:56pm (CST)
Entering interactive mode (type "help" for commands, "o" for options)
(pprof) top
Showing nodes accounting for 3, 0.15% of 2061 total
Showing top 10 nodes out of 24
flat flat% sum% cum cum%
3 0.15% 0.15% 3 0.15% runtime.gopark
0 0% 0.15% 1 0.049% bufio.(*Reader).ReadLine
0 0% 0.15% 1 0.049% bufio.(*Reader).ReadSlice
0 0% 0.15% 1 0.049% bufio.(*Reader).fill
0 0% 0.15% 2 0.097% internal/poll.(*FD).Read
0 0% 0.15% 2 0.097% internal/poll.(*pollDesc).wait
0 0% 0.15% 2 0.097% internal/poll.(*pollDesc).waitRead (inline)
0 0% 0.15% 2 0.097% internal/poll.runtime_pollWait
0 0% 0.15% 1 0.049% main.pushData
0 0% 0.15% 2 0.097% net.(*conn).Read
(pprof) list main.pushData
Total: 2061
ROUTINE ======================== main.pushData in /Users/zhhnzw/workspace/mygithub/go_demos/interview/block/block.go
0 1 (flat, cum) 0.049% of Total
. . 20: log.Fatal(err)
. . 21: }
. . 22:}
. . 23:
. . 24:func pushData(w http.ResponseWriter, r *http.Request) {
. 1 25: queue <- "hello"
. . 26: w.WriteHeader(http.StatusOK)
. . 27: if _, err := w.Write([]byte("success")); err != nil {
. . 28: log.Println(err)
. . 29: }
. . 30:}
从如上结果就可以看出,有1个goroutine
阻塞在了第25行。