Home > Mobile >  How to debug an amok running pointer writing to my variable?
How to debug an amok running pointer writing to my variable?

Time:09-28

Crazy things are going on within an embedded microcontroller software written in C11. (Yes not C , only C) In summary I have a static variable that is written only at two places in the whole software. As I should not post thousands lines of code here, I'll summarize the code that is writing to the variable:

static index foo = 0;
define MAX_FOO 10

void bar_timer() {
    printf("foo:%d\n", foo);
    foo  ;
    if (foo >= MAX_FOO)
        foo=0;
}

To be clear the software does really not look as simple as the simple snipped above. foo is actually an index (int) inside an array of static structs which contains integers, floats and a float pointer pointing to an area reserved by using malloc. Nevertheless, the output from the printfgoes like this:

foo: 0
foo: 2
foo: 4
foo: 6
foo: 8
foo: 0
foo: 2
...

Because the index foo is part of an array I just came up with an idea and increased that array which leads to crazy side effects, such as the software is incredibly slow or hangs at the beginning.

I also tried putting a printf(Before %d, foo) before the foo and a printf(After %d, foo) after the foo which results in:

Before 0
After  1
Before 2
After  3
...

Removing the foo ends up in:

Before 0
After  0
Before 0
After  0
...

I could bet that there is some kind of amok running pointer (anywhere hidden in a software bug somewhere in the thousands of lines of code) which just increases the storage value behind my variable foo.

How to debug amok running pointer writing to my variable?

CodePudding user response:

First of all, static and malloc doesn't make sense, your variable can't have both static storage and allocated storage at once - it must have either of the two.

How to debug amok running pointer writing to my variable?

Depends on the target. The best way available on modern MCUs only, is to use a hardware write breakpoint on the memory location, then when the breakpoint hits check the hardware instruction trace to find the culprit. You'll also need a decent debugger with trace support.

If you have all of the above, finding the bug takes some 5 to 10 minutes. As opposed to hours/days/weeks with conventional debugging. So this is definitely something to consider when picking MCU and tool chain.

Assuming you don't have modern parts and equipment, then some in-circuit debuggers may have the ability to update a watched variable in real-time. This might provide clues, particularly if you disable the code which is supposed to write to the variable, then find it written to still.

The most likely culprit when getting completely random behavior is stack overflow. static variables are stored in .bss/.data (or .heap for PC programmers coding embedded systems...) but a common beginner mistake is to memory map the stack so that it overflows into other parts of the RAM, such as .bss. This is the first thing you should rule out.

The common trick to check for stack overflows is to launch the program in your debugger, set the whole stack area to a known value like 0xAA, then let the program run for a while and freeze it. Inspect the memory area where the stack is to see how much of your 0xAA cells it has chewed up.

If isn't a stack overflow but indeed some corrupt pointer or array out of bounds, then start by checking the map file generated by the linker. Are there are large buffers or arrays living nearby your variable?

In case you suspect overruns, you can make your own "canaries" similar to filling the stack with 0xAA above. Declare some dummy volatile variables above or below the one getting corrupted, see if they change too.

Yet another possible culprit is race conditions. In case bar_timer() is executed from an ISR but the static variables is shared between the ISR and main(). Needless to say, you also shouldn't call bloat functions like stdio.h ones from inside an ISR.

When everything else fails, you have to resort to single-stepping through the program until the point where the bug first manifests itself.

In the end, the best way to avoid bugs like this is to not write them in the first place. Which is achieved by disciplined use of coding standards above all else. For example you mention malloc, which shouldn't be used in microcontroller applications, mainly because it doesn't make the slightest sense to use it. But if you use it anyway, you'll also have the usual PC programming problems like memory leaks, heap corruption and fragmentation etc.

  • Related