Home > database >  heap-buffer-overflow for Leetcode strStr problem
heap-buffer-overflow for Leetcode strStr problem

Time:07-14

The problem is for LeetCode's Implement strStr. I have a working solution when I compile and execute in my computer, it works fine, but when I try to run it with LeetCode it gives me some weird error. I don't have any idea what is the problem.

This is my solution:

#include <stdio.h>
#include <stdlib.h>

int strStr(char *haystack, char *needle) {
    if (needle[0] == '\0')
        return 0;

    int i = 0;
    int j = 0;

    while (haystack[i] != '\0') {
        while (haystack[i] == needle[j] && haystack[i] != '\0' && needle[j] != '\0') {
            i  ;
            j  ;
        }

        if (needle[j] == '\0') {
            return i - j;
        } else {
            j = 0;
        }

        i  ;
    }
    return -1;
}

int main() {

    printf("%d\n", strStr("aaa", "aaaa"));

    return 0;
}

and this is the error I am getting in LeetCode

=================================================================
==32==ERROR: AddressSanitizer: heap-buffer-overflow on address 0x602000000014 at pc 0x5620a8c4472e bp 0x7fff98a004c0 sp 0x7fff98a004b0
READ of size 1 at 0x602000000014 thread T0
    #2 0x7fdce2ee00b2 in __libc_start_main (/lib/x86_64-linux-gnu/libc.so.6 0x270b2)
0x602000000014 is located 0 bytes to the right of 4-byte region [0x602000000010,0x602000000014)
allocated by thread T0 here:
    #0 0x7fdce3b25bc8 in malloc (/lib/x86_64-linux-gnu/libasan.so.5 0x10dbc8)
    #3 0x7fdce2ee00b2 in __libc_start_main (/lib/x86_64-linux-gnu/libc.so.6 0x270b2)
Shadow bytes around the buggy address:
  0x0c047fff7fb0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  0x0c047fff7fc0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  0x0c047fff7fd0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  0x0c047fff7fe0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  0x0c047fff7ff0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
=>0x0c047fff8000: fa fa[04]fa fa fa 05 fa fa fa fa fa fa fa fa fa
  0x0c047fff8010: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
  0x0c047fff8020: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
  0x0c047fff8030: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
  0x0c047fff8040: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
  0x0c047fff8050: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
Shadow byte legend (one shadow byte represents 8 application bytes):
  Addressable:           00
  Partially addressable: 01 02 03 04 05 06 07 
  Heap left redzone:       fa
  Freed heap region:       fd
  Stack left redzone:      f1
  Stack mid redzone:       f2
  Stack right redzone:     f3
  Stack after return:      f5
  Stack use after scope:   f8
  Global redzone:          f9
  Global init order:       f6
  Poisoned by user:        f7
  Container overflow:      fc
  Array cookie:            ac
  Intra object redzone:    bb
  ASan internal:           fe
  Left alloca redzone:     ca
  Right alloca redzone:    cb
  Shadow gap:              cc
==32==ABORTING

Please explain why this is occurring?

CodePudding user response:

The problem is you increment i inside the inner loop and again in the outer loop, potentially skipping the null terminator, hence accessing bytes in haystack beyond the end, which has undefined behavior.

You should only increment j in the inner loop and compare haystack[i j] == needle[j].

Here is a modified version:

#include <stdio.h>

int strStr(const char *haystack, const char *needle) {
    if (needle[0] == '\0')
        return 0;

    int i = 0;

    while (haystack[i] != '\0') {
        int j = 0;
        while (needle[j] != '\0' && haystack[i   j] == needle[j]) {
            j  ;
        }
        if (needle[j] == '\0') {
            return i;
        }
        i  ;
    }
    return -1;
}

int main() {
    printf("%d\n", strStr("aaaaaaaaab", "aaaab"));
    printf("%d\n", strStr("aaaaaaaa", "aaaaaaaaa"));
    printf("%d\n", strStr("Hello world\n", "or"));
    return 0;
}

Note that you can remove some redundant comparisons by reorganising the code:

int strStr(const char *haystack, const char *needle) {
    for (int i = 0;; i  ) {
        for (int j = 0;; j  ) {
            if (needle[j] == '\0')
                return i;
            if (haystack[i   j] != needle[j])
                break;
        }
        if (haystack[i] == '\0')
            return -1;
    }
}

Note however that this method has a worst case time complexity of O(len(haystack)*len(needle)), with a pathological example of:

strStr("aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa", "aaaaaaaaaaaaaaaaaaaaaaab")

CodePudding user response:

In the function strStr(), when i and j both equal to 3, the nested while loop exits because the condition haystack[i] != '\0' becomes false. Below in the code, it check for needle[j] == '\0' which is false as for j = 3, needle[j] is not equal to \0. Then it increments i which makes value of i equal to 4 and the outer while loop iterates and check the condition haystack[i] != '\0' which results in accessing haystack beyond the size of buffer it is pointing to because the valid index for the buffer that haystack is pointing to is range from 0 - 3 (string - "aaa").

Since, you have posted AddressSanitizer output, below is the explanation of how to interpret ASan output and identify the problem:

The error reported by ASan is:

ERROR: AddressSanitizer: heap-buffer-overflow on address 0x602000000014

buffer-overflow means your program is trying to access memory beyond the size of array, precisely trying to access the string literal beyond its size1).

Check this statement in the ASan output:

0x602000000014 is located 0 bytes to the right of 4-byte region [0x602000000010,0x602000000014)

This means program attempting to access address 0x602000000014 which exists 0 bytes right to 4-byte region [0x602000000010,0x602000000014).

A point to note here for region [0x602000000010,0x602000000014) - square brackets [ mean the end point is included, and round parentheses ) mean it's excluded.

The 4-byte region 0x602000000010 - 0x602000000013 contain the string literal "aaa" which is pointed by haystack pointer:

                         0x602000000014
                  0x602000000013      |
            0x602000000012     |      |
      0x602000000011     |     |      |
0x602000000010     |     |     |      |
             |     |     |     |      |
             ----- ----- ----- ------ ----- 
            |  a  |  a  |  a  |  \0  |     |
             ----- ----- ----- ------ ----- 
             \                      /
               -------------------- 
                        |
             4-byte region pointed 
              by haystack pointer

Note that ASan creates poisoned red zones at the edges of objects to detect overflows or underflows and during compilation ASan instruments the code to verify the shadow memory state at each memory access2). 1 byte of shadow memory keeps the track of 8 bytes of memory used by ASan instrumented program.

Now, check this section of ASan output:

Shadow bytes around the buggy address:

in the output, a memory region is highlighted with => :

=>0x0c047fff8000: fa fa[04]fa fa fa 05 fa fa fa fa fa fa fa fa fa
                       ^^^^

In [04] -

  • 04 indicates that first four bytes of 8 byte region (which are mapped with this byte in shadow memory) are addressable. That means the memory region from 0x602000000010 to 0x602000000013 contain "aaa" (including null terminating character) are addressable.
  • square bracket [] indicates that your program trying to access the redone (basically, the memory which is not allowed to access) which are mapped to this byte in shadow memory.

Your program trying to access the byte just next to terminating null character \0 of string "aaa" and ultimately attempting to access redzone and, hence, the ASan reporting it.

The other post already shown the better implementation of strStr(). I am leaving it up to you to fix the problem in your code and optimise the implementation of strStr() function.

A suggestion:

When compiling program with -fsanitize=address, enable the debugging information as well (e.g. -g option of gcc compiler) and you will get line number and proper stack in the ASan output.


1). Not sure why it's reporting heap-buffer-overflow for accessing string literal beyond its size. On my system ASan output for same program giving error - global-buffer-overflow, which seems more appropriate as the string literals usually allocated in data segment but where they will be placed can vary based on underlying platform/architecture.

2). AddressSanitizer - How it works?

  • Related