go tool pprof
has a -call_tree
option that, according to -help
, should “Create a context-sensitive call tree”. However, pprof -tree
on a CPU profile gives me the exact same output with and without this option. It looks like this (one representative node):
flat flat% sum% cum cum% calls calls% context
---------------------------------------------------------- -------------
0.07s 35.00% | google.golang.org/grpc/internal/transport.(*http2Server).operateHeaders
0.04s 20.00% | golang.org/x/net/http2.(*Framer).readMetaFrame
0.02s 10.00% | github.com/Shopify/sarama.(*FetchResponse).decode
0.06s 0.79% 51.18% 0.20s 2.63% | runtime.mapassign_faststr
0.05s 25.00% | runtime.newobject (inline)
0.03s 15.00% | aeshashbody
0.03s 15.00% | runtime.mallocgc
This is clearly not a tree, not least because it shows Sarama under HTTP/2 — consuming from Kafka (via Sarama) and serving HTTP/2 are two independent things this process does concurrently.
Why doesn’t -call_tree
affect this output of -tree
? In general, what does -call_tree
do?
Extra credit
What is the exact meaning of the output node I showed above? Does it mean that 35% of samples containing mapassign_faststr
also contained operateHeaders
somewhere up the stack, and 10% contained decode
? What about the lines below, such as mallocgc
?
What documents could I read to answer the above questions?
CodePudding user response:
Why doesn’t -call_tree affect this output of -tree
I think the -call_tree
option doesn't change the output of -tree
the outputs are not actually a tree, it outputs the nodes of the tree(more on this in the extra credit section).
In general, what does -call_tree do?
You can see a difference when you take a look at images generated with the -png
flag, without the -call_tree
flag:
And with the -call_tree
flag:
So instead of having 1 call tree, pprof attempts to create seperate trees based on context. In my case(will list them since the text on the image is not readable) the roots are:
testing.(*B).launch
(benchmark/test framework)runtime.gcBgMarkWorker
(part of the runtime GC)runtime.bgsweep
(part of the runtime GC)runtime.mcall
(part of the runtime scheduler)runtime.morestack
(something to do with the stack :) )
In the non -call_tree
image these nodes are still present, but start mid tree, as if our code calls these background processed directly.
Basically basically what the option does is remove/hide infrequent calls between functions, so you will end up with a tree for each set of functions that call each other frequently.
I haven't tested this but I imagine that pprof will do this context aware tree separation for user code as well. All in all it returns a subjectively more readable tree, or at least a more relevant one.
What is the exact meaning of the output node I showed above?
The -tree
option attempts to output the tree, as shows in the images. But since it is text output it shows you 1 node of the tree at a time, the non indented line in the context
column is the current node, the rows above are nodes that call the current node, the rows below the nodes which this one calls (the arrows in the image).
The calls%
is a the "weight" of the the incoming or outgoing edge, so indeed the percentage of calls from or to a function.
What documents could I read to answer the above questions?
If figured all of this out by looking at the source code, here are some key parts, in case you are interested:
- The file which does most of the output generation: https://github.com/google/pprof/blob/2007db6d4f53c44a417ddae675d50f56b8e8c2fd/internal/report/report.go
- The function for the
-tree
option: https://github.com/google/pprof/blob/2007db6d4f53c44a417ddae675d50f56b8e8c2fd/internal/report/report.go#L1047 - Line which explains when
-call_tree
is actually used: https://github.com/google/pprof/blob/2007db6d4f53c44a417ddae675d50f56b8e8c2fd/internal/report/report.go#L133