Home > other >  Why is address sanitizer not indicating a memory leak after malloc() memory was not freed?
Why is address sanitizer not indicating a memory leak after malloc() memory was not freed?

Time:03-10

(I did not write this code, my professor did...) I was looking at some code that my professor wrote and it all made sense to me, except for one thing. (Because we were running out of time, he did not bother to free any of the memory), however, he was compiling with address sanitizer on. But when he ran the code, no address sanitizer error warning was shown?

We were running gcc 9.3 on an Ubuntu machine. When I comment out the add_line function, it throws leaks, only for crnt. I guess lines does not throw a memory leak because it was declared in the global space? But why doesn't crnt throw a memory leak when the add_line function is called?

(Also, here are the compile flags that are used. -g -std=c99 -Wall -Wvla -fsanitize=address,undefined)

Here is the code:

#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
#include <fcntl.h>
#include <string.h>

#define DEBUG 1

#define BUFSIZE 8
#define LISTLEN 16

char **lines;
int line_count, line_array_size;

void add_line(char *p)
{
    if (DEBUG) printf("Adding |%s|\n", p);
    if (line_count == line_array_size) {
    line_array_size *= 2;
    lines = realloc(lines, line_array_size * sizeof(char *));
    // TODO: check whether lines is NULL
    }

    lines[line_count] = p;
    line_count  ;
}

int main(int argc, char **argv)
{
    int fd, bytes;
    char buf[BUFSIZE];
    char *crnt;
    int len;
    int pos, start;

    // TODO: move array list management to separate functions
    lines = malloc(sizeof(char *) * LISTLEN);
    if (!lines) {
    printf("malloc failed\n");
    return EXIT_FAILURE;
    }

    line_array_size = LISTLEN;
    line_count = 0;

    if (argc > 1) {
    fd = open(argv[1], O_RDONLY);
    if (fd == -1) {
        perror(argv[1]);
        return EXIT_FAILURE;
    }
    } else {
    fd = 0;
    }

    crnt = NULL;
    len = 0;
    while ((bytes = read(fd, buf, BUFSIZE)) > 0) {
    // read buffer and break file into lines

    start = 0;
    for (pos = 0; pos < bytes; pos  ) {
        if (buf[pos] == '\n') {
        if (crnt == NULL) {
            len = pos - start;
            crnt = malloc(len   1);
            memcpy(crnt, &buf[start], len);
        } else {
            len  = pos;
            crnt = realloc(crnt, len   1);
            memcpy(&crnt[len - pos], buf, pos);
        }
        crnt[len] = '\0';
        // add_line(crnt); <------------- When I uncomment this line, no address-sanitizer leak is detected. With this line commented, asan does throw a leak only for the crnt variable. Why is that?
        crnt = NULL;
        start = pos   1;
        }
    }

    if (start < pos) {
        if (crnt == NULL) {
        len = pos - start;
        crnt = malloc(len   1);
        memcpy(crnt, &buf[start], len);
        } else {
        int newlen = len   (pos - start);
        crnt = realloc(crnt, newlen   1);
        memcpy(&crnt[len], &buf[start], pos - start);
        len = newlen;
        }
        crnt[len] = '\0';  // technically unnecessary
    }
    }
    if (bytes == -1) {
    perror("read");
    return EXIT_FAILURE;
    }

    // if we reach here, we have read the entire file
    // sort and print the list
    

    return 0;
}

CodePudding user response:

The issue here is the definition of "memory leak". I would have liked to have quoted a section in LeakSanitizer's documentation where it offers a clear and precise definition of the concept, which seems fundamental to its operation, but I couldn't find one, so you'll have to bear with a bit of projection on my part.

A region of dynamically allocated (i.e. with malloc or friends) memory has leaked when there is no possible way for it to be freed. In other words, if your program allocates memory and throws away the address before the allocation has been free'd, the memory has leaked.

That's subtly different from what you might think the definition is. You might think that memory has leaked if your program terminates without freeing every block of memory it allocated. That's certainly a possible definition, and I'm not going to criticise it (much), but it's actually not very precise.

At what point has the program terminated? It hasn't really terminated when main() returns, because you might still have clean-up functions registered with atexit(), and those functions don't execute until after main() returns. (Or when exit() is called, which is effectively the same thing.) It's actually pretty common (though, to my mind, pointless) to use atexit() functions precisely in order to free() objects which might not have been deallocated before exit().

OK, you can't check whether a memory allocation has been freed by checking whether it has been freed when main returns. If you want to do it that way, you need to defer the test until really the last possible moment. But at what's really the last possible moment, the process is about to cease to exist and the operating system is going to reclaim all the memory used by the process, including whatever resources were acquired by the memory allocation library. So at the last possible moment, there is no memory leak, because there is no memory.

(There are embedded systems which have no concept of separate processes memory, etc., and so what I wrote up there might not apply to every possible computation system. But it applies to everything on which AddressSanitizer is implemented.)

A key point is that the atexit() handler needs to be able to find the objects it is cleaning up, and since it executes after main() has terminated, it cannot use any automatic (i.e. stack-allocated) object. Only objects with static lifetime are available to it. So for it to be able to do its task, the address of the object to be cleaned up on termination must be stored in global memory. If the region's memory is not stored somewhere persistent, the memory has leaked (as per my definition above) and we don't actually have to wait to see whether an atexit manages to free the memory.

Which brings us back to what I claim is a workable definition of a memory leak: dynamically allocated memory whose address is no longer present in the executable. That memory region can no longer be used, so it's garbage, but it cannot be freed because there the program doesn't know what its address is.

Your lines array is a global variable. Indeed, you point that out in your question:

I guess lines does not throw a memory leak because it was declared in the global space?

That's correct. lines is a global variable, so its contents are still accessible even after main() returns. Not only are its contents accessible, so is any memory pointed to by some object in the array it points to. You could, if you wanted to, free the saved lines in an atexit handler:

void cleanup(void) {
  for (int i = 0; i < line_count;   i) { free(lines[i]); }
  free(lines);
}

(To use that, you only need to call atexit(cleanup) just after you initialise lines and line_count.)

So that brings us to:

But why doesn't crnt throw a memory leak when the add_line function is called?

crnt contains the address of a dynamically-allocated buffer which contains the current line. If you call add_line(crnt), that pointer is stored in lines. So it's available for the clean-up function, as above. You can set crnt to NULL at your convenience, because it is no longer the only pointer to that buffer.

But if you don't call add_line, then crnt is the only pointer to that buffer and when you set crnt to NULL, there is no longer a pointer to the buffer. The buffer has leaked and AddressSanitizer is there to tell you about it. (AddressSanitizer would have caught the problem even if you hadn't set crnt to NULL, because crnt ceases to exist when main() returns or calls exit(), and at that point the address has been lost. Or if you overwrite crnt with a different allocation's address.)

For a much simpler example, try these two very similar programs:

Memory leak

#include <stdlib.h>
int main(void) {
  void* megabyte = malloc(1<<20);
  (void)megabyte; /* Suppress unused variable warning */
}

No memory leak

#include <stdlib.h>
void* megabyte;
int main(void) {
  megabyte = malloc(1<<20);
}

Note that the Valgrind memcheck tool can report on memory, like megabyte in the second example, which is never freed even though it is still reachable at what Valgrind considers the end of execution. But it doesn't do so by default. If you run Valgrind on the second program with the flags --show-leak-kinds=all --leak-check=full, it will report that a megabyte of memory is "still reachable". (To try valgrind, you have to compile the program without AddressSanitizer, I believe. The two tools are not completely compatible.)

  • Related