I need to sort a thread dump based on the content of the variable "cpu" in it. Example:
"GC Thread#7" os_prio=0 cpu=143.66ms elapsed=57.42s tid=0x0000000003c8b800 nid=0x1bf1 runnable
"G1 Refine#6" os_prio=0 cpu=0.15ms elapsed=29.03s tid=0x0000000006efb000 nid=0x1cda runnable
"G1 Refine#5" os_prio=0 cpu=0.27ms elapsed=29.03s tid=0x0000000005b5b800 nid=0x1cd9 runnable
"G1 Refine#0" os_prio=0 cpu=10.31ms elapsed=59.04s tid=0x0000000000f1c800 nid=0x1bd1 runnable
"GC Thread#4" os_prio=0 cpu=143.24ms elapsed=57.42s tid=0x0000000003c88800 nid=0x1bee runnable
"GC Thread#3" os_prio=0 cpu=146.71ms elapsed=57.42s tid=0x0000000004003800 nid=0x1bed runnable
I'd need to have the list sorted by cpu. By storing the list in a file, I see the following command works pretty well:
sort -k 3 cpu.txt
Until I have found that it doesn't work with Threads that have multiple spaces in their name:
"Weld Thread Pool -- 2" #145 prio=5 os_prio=0 cpu=8.66ms elapsed=49.56s tid=0x00000000088e2800 nid=0x1cb8 waiting on condition [0x00007ffa47d20000]
It seems I cannot use the sort criteria provided by "sort -k" as it counts spaces as columns. Any idea what could be another option? Thanks!
CodePudding user response:
Using GNU awk:
$ gawk '{ # GNU awk
for(i=1;i<=NF;i ) # loop all fields
if($i~/^cpu=/) { # when cpu= found
split($i,cpu,/=/) # split @ = to get the time
break # no need to search further
}
a[cpu[2]]=a[cpu[2]] $0 ORS # append to time indexed array *
}
END { # in the end
PROCINFO["sorted_in"]="@ind_num_asc" # sort on the index
for(i in a) # loop all indexes
printf "%s", a[i] # output
}' file
Output:
"G1 Refine#6" os_prio=0 cpu=0.15ms elapsed=29.03s tid=0x0000000006efb000 nid=0x1cda runnable
"G1 Refine#5" os_prio=0 cpu=0.27ms elapsed=29.03s tid=0x0000000005b5b800 nid=0x1cd9 runnable
"Weld Thread Pool -- 2" #145 prio=5 os_prio=0 cpu=8.66ms elapsed=49.56s tid=0x00000000088e2800 nid=0x1cb8 waiting on condition [0x00007ffa47d20000]
"G1 Refine#0" os_prio=0 cpu=10.31ms elapsed=59.04s tid=0x0000000000f1c800 nid=0x1bd1 runnable
...
* If there are several equal cpu times, the records are appended to the same array cell unsorted as there was no requirement to clarify that. Since we are using GNU awk it is possible to use two or more dimensional arrays and sort further using the same logic (a[cpu[1]][etc[1]]...for(i in a)for(j in a[i])
).