I have a very large, very old, very byzantine, very undocumented set of Fortran code that I am trying to troubleshoot. It is giving me divide-by-zero problems at run time due to a section that's roughly like this:
subroutine badsub(ainput)
implicit double precision (a-h,o-z)
include 'commonincludes2.h'
x=dsqrt((r(6)-r(8))**2 (z(6)-z(8))**2)
y=ainput
w=y x
v=2./dlog(dsqrt(w/y))
This code hits divide by zero on the last line, because y
is equal to w
because x
is zero, and thus dlog(dsqrt(1)
is zero.
The include file looks something like this:
common /cblk/ r(12),z(12),otherstuff
There are actually 3 include headers with /cblk/
declaration which I've found from running grep -in "/cblk/" *.h *.f *.F
: "commonincludes.h", "commonincludes2.h", and "commonincludes3.h". As an added bonus, the section of memory corresponding to r
and z
are named x
and y
in "commonincludes.h", i.e. "commonincludes'h" looks like:
common /cblk/ x(12),y(12),otherstuff
My problem is, I have NO IDEA where r
and z
are set. I've used grep
to find everyplace where each of the headers are included, and I don't see anyplace where the variables are written into.
If I inspect the actual values in r
and z
in gdb where the error occurs the values look reasonable--they're non-zero, not-garbage-looking vectors of real numbers, it's just that r(6)
equals r(8)
and z(6)
equals z(8)
that's causing issue.
I need to find where z
and r
get written, but I can't find any instruction in the gdb documentation for attaching a watchpoint to COMMON
block. How can I find where these are written to?
CodePudding user response:
I think I have figured out how to do what I'm trying to do. Because COMMON
variables are allocated statically, their addresses shouldn't change from run to run. Therefore, when my program stops due to my divide-by-zero error, I'm able to find the memory address of (in this example) r(8)
, which is global in scope and shouldn't change on subsequent runs. I can then re-run the code with a watchpoint on that address and it will flag when the value changes anywhere in the code.
In my example, the gdb session looks like this, with process names and directories filed off to protect the guilty:
Reading symbols from myprogram...
(gdb) r
Starting program: ************
Program received signal SIGFPE, Arithmetic exception.
0x00000000004df96d in badsub (ainput=1875.0000521766287) at badsub.f:109
109 v=2./dlog(dsqrt(w/y))
(gdb) p &r(8)
$1 = (PTR TO -> ( real(kind=8) )) 0xcbf7618 <cblk_ 56>
(gdb) watch *(double precision *) 0x0cbf7618
Hardware watchpoint 1: *(double precision *) 0x0cbf7618
(gdb) r
The program being debugged has been started already.
Start it from the beginning? (y or n) y
Starting program: *************
Hardware watchpoint 1: *(double precision *) 0x0cbf7618
Old value = 0
New value = 6.123233995736766e-17
0x00007ffff6f2be2d in __memmove_avx_unaligned_erms () from /lib64/libc.so.6
I have confirmed from running a backtrace that this is indeed a place (presumably the first place) where my common block variable is being set.