I have a gdb python macro walking through data in a c generated core file. The macro can take a long time to run. It walks through a list of struct pointers, reading each pointer into a gdb.Value. The majority of the time is spent when the first piece of data in that struct is accessed. That is due to the lazy feature that gdb.Value has.
The hardcoded values below represents actually a mild core. This core took 8 seconds to walk through this code, and ultimately print the table of strings. It can actually take several minutes.
i.e.
for i in range(64):
structureB = structureA[i]
for j in range(6144):
# sizeof(gdb_val) == 768
gdb_val = structureB['pointer']
if True:
# Majority of the time spent here
# This reads the entire structure in to gdb_val
if gdb_val['data1'] != some_constant:
return
else:
# Was hoping that this would speed it up because it is
# accessing the data directly. But in this case there is
# no difference in the total time spent.
# The return below is being hit the majority of the time
# Only 1391 out of the 393216 times hitting the if make it pass the return.
# So only 0.35%
# This itself is not eating up time
data_gdb = gdb_val['data1']
# The rest of this is eating up time
bytes = self.inferior.read_memory(data_gdb.address, 1)
data = int.from_bytes(bytes[0], byteorder="little")
if data != some_constant:
return
# This is faster because the data is already loaded
if gdb_val['data2']:
do_something
# parse out several members of gdb_val
# create string from several members
# append string to a table
In addition to read_memory. Tried these two with similar results:
gdb.parse_and_eval("(char *) 0x%x" data_gdb.address)
and
gdb.execute("output/d (char*%s" % data_gdb.address), True, True)
Any other way to access that one byte faster?
Are there other ways to analyze core files that may be faster?
i.e. Writing a c library that will walk through the data? I tried googling search for that but I am only finding tutorials on how to analyze c core files. I am hoping that it is just that my google skills are lacking and that something along those lines exist. (I have seen some example code where python loads a c library to parse through data. But that data was passed into the library. I haven't seen an example where c has direct access to gdb or the core binary data)
CodePudding user response:
Are there other ways to analyze core files that may be faster?
A core
file is logically just an image of the process in memory at the time the core
was created.
You can use read()
to access the data as fast as your disk allows.
The trick is to find where in memory the data you care about was. Once you know that, finding that data in an ELF core
file is trivial (you are on an ELF platform, right?).
Iterate over LOAD
segments in the core
(using e.g. libelf
from http://elfutils.org), find the LOAD
segment which "covers" your address, then read the data from core
at offset pt_load.p_offset address - pt_load.p_vaddr
.
Update:
I'll be examining multiple cores and across different past versions of c code. In case the data structures are slightly different between different versions, I wouldn't want their change in sizes/offsets to require me to handle it differently.
Like I said, the hard part is knowing where in memory the data was. That's the value-add of GDB -- it decodes the debug info for you.
While you can use libdw
to read the DWARF debug info from your binaries (just as GDB would), depending on the complexity of your data, it may or may not be worth your time to develop a custom solution.
If "multiple cores" means 10s of core
s, you are probably better off just using GDB. If you have 1000s of them, and each takes 1h to process, you may be better off with a custom solution.
If the layout of structureA
and structureB
is pretty simple, a custom solution may take a few days to develop.
If inside structX
there are unions, packed arrays, variable-sized buffers and bit-fields, a custom solution may take 3 weeks.
Update 2:
On a tangent, I tried using ThreadPoolExecutor but it is hitting some error (Unexpected lazy value type). Before I delve into that, do you know if gdb supports running concurrent threads ?
I don't believe there is any locking inside GDB. If you "inject" threads into GDB via Python and then start calling back into the GDB proper in parallel, you are going to have a bad time.
I just looked, and it turns out that my belief is outdated: there is some locking since commit d55c9a68473d.
However, that locking is for the very specific purpose of the commit ("Demangle minsyms in parallel") and is not at all general, so I believe the "you are going to have a bad time" conclusion stands.