The problem is for LeetCode's Implement strStr
. I have a working solution when I compile and execute in my computer, it works fine, but when I try to run it with LeetCode it gives me some weird error. I don't have any idea what is the problem.
This is my solution:
#include <stdio.h>
#include <stdlib.h>
int strStr(char *haystack, char *needle) {
if (needle[0] == '\0')
return 0;
int i = 0;
int j = 0;
while (haystack[i] != '\0') {
while (haystack[i] == needle[j] && haystack[i] != '\0' && needle[j] != '\0') {
i ;
j ;
}
if (needle[j] == '\0') {
return i - j;
} else {
j = 0;
}
i ;
}
return -1;
}
int main() {
printf("%d\n", strStr("aaa", "aaaa"));
return 0;
}
and this is the error I am getting in LeetCode
=================================================================
==32==ERROR: AddressSanitizer: heap-buffer-overflow on address 0x602000000014 at pc 0x5620a8c4472e bp 0x7fff98a004c0 sp 0x7fff98a004b0
READ of size 1 at 0x602000000014 thread T0
#2 0x7fdce2ee00b2 in __libc_start_main (/lib/x86_64-linux-gnu/libc.so.6 0x270b2)
0x602000000014 is located 0 bytes to the right of 4-byte region [0x602000000010,0x602000000014)
allocated by thread T0 here:
#0 0x7fdce3b25bc8 in malloc (/lib/x86_64-linux-gnu/libasan.so.5 0x10dbc8)
#3 0x7fdce2ee00b2 in __libc_start_main (/lib/x86_64-linux-gnu/libc.so.6 0x270b2)
Shadow bytes around the buggy address:
0x0c047fff7fb0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
0x0c047fff7fc0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
0x0c047fff7fd0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
0x0c047fff7fe0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
0x0c047fff7ff0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
=>0x0c047fff8000: fa fa[04]fa fa fa 05 fa fa fa fa fa fa fa fa fa
0x0c047fff8010: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
0x0c047fff8020: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
0x0c047fff8030: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
0x0c047fff8040: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
0x0c047fff8050: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
Shadow byte legend (one shadow byte represents 8 application bytes):
Addressable: 00
Partially addressable: 01 02 03 04 05 06 07
Heap left redzone: fa
Freed heap region: fd
Stack left redzone: f1
Stack mid redzone: f2
Stack right redzone: f3
Stack after return: f5
Stack use after scope: f8
Global redzone: f9
Global init order: f6
Poisoned by user: f7
Container overflow: fc
Array cookie: ac
Intra object redzone: bb
ASan internal: fe
Left alloca redzone: ca
Right alloca redzone: cb
Shadow gap: cc
==32==ABORTING
Please explain why this is occurring?
CodePudding user response:
The problem is you increment i
inside the inner loop and again in the outer loop, potentially skipping the null terminator, hence accessing bytes in haystack
beyond the end, which has undefined behavior.
You should only increment j
in the inner loop and compare haystack[i j] == needle[j]
.
Here is a modified version:
#include <stdio.h>
int strStr(const char *haystack, const char *needle) {
if (needle[0] == '\0')
return 0;
int i = 0;
while (haystack[i] != '\0') {
int j = 0;
while (needle[j] != '\0' && haystack[i j] == needle[j]) {
j ;
}
if (needle[j] == '\0') {
return i;
}
i ;
}
return -1;
}
int main() {
printf("%d\n", strStr("aaaaaaaaab", "aaaab"));
printf("%d\n", strStr("aaaaaaaa", "aaaaaaaaa"));
printf("%d\n", strStr("Hello world\n", "or"));
return 0;
}
Note that you can remove some redundant comparisons by reorganising the code:
int strStr(const char *haystack, const char *needle) {
for (int i = 0;; i ) {
for (int j = 0;; j ) {
if (needle[j] == '\0')
return i;
if (haystack[i j] != needle[j])
break;
}
if (haystack[i] == '\0')
return -1;
}
}
Note however that this method has a worst case time complexity of O(len(haystack)*len(needle)), with a pathological example of:
strStr("aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa", "aaaaaaaaaaaaaaaaaaaaaaab")
CodePudding user response:
In the function strStr()
, when i
and j
both equal to 3
, the nested while
loop exits because the condition haystack[i] != '\0'
becomes false
. Below in the code, it check for needle[j] == '\0'
which is false
as for j = 3
, needle[j]
is not equal to \0
. Then it increments i
which makes value of i
equal to 4
and the outer while
loop iterates and check the condition haystack[i] != '\0'
which results in accessing haystack
beyond the size of buffer it is pointing to because the valid index for the buffer that haystack
is pointing to is range from 0 - 3
(string - "aaa"
).
Since, you have posted AddressSanitizer output, below is the explanation of how to interpret ASan output and identify the problem:
The error reported by ASan is:
ERROR: AddressSanitizer: heap-buffer-overflow on address 0x602000000014
buffer-overflow
means your program is trying to access memory beyond the size of array, precisely trying to access the string literal beyond its size1).
Check this statement in the ASan output:
0x602000000014 is located 0 bytes to the right of 4-byte region [0x602000000010,0x602000000014)
This means program attempting to access address 0x602000000014
which exists 0
bytes right to 4-byte region [0x602000000010,0x602000000014)
.
A point to note here for region [0x602000000010,0x602000000014)
- square brackets [
mean the end point is included, and round parentheses )
mean it's excluded.
The 4-byte region 0x602000000010 - 0x602000000013
contain the string literal "aaa"
which is pointed by haystack
pointer:
0x602000000014
0x602000000013 |
0x602000000012 | |
0x602000000011 | | |
0x602000000010 | | | |
| | | | |
----- ----- ----- ------ -----
| a | a | a | \0 | |
----- ----- ----- ------ -----
\ /
--------------------
|
4-byte region pointed
by haystack pointer
Note that ASan creates poisoned red zones at the edges of objects to detect overflows or underflows and during compilation ASan instruments the code to verify the shadow memory state at each memory access2). 1
byte of shadow memory keeps the track of 8
bytes of memory used by ASan instrumented program.
Now, check this section of ASan output:
Shadow bytes around the buggy address:
in the output, a memory region is highlighted with =>
:
=>0x0c047fff8000: fa fa[04]fa fa fa 05 fa fa fa fa fa fa fa fa fa
^^^^
In [04]
-
04
indicates that first four bytes of8
byte region (which are mapped with this byte in shadow memory) are addressable. That means the memory region from0x602000000010
to0x602000000013
contain"aaa"
(including null terminating character) are addressable.- square bracket
[]
indicates that your program trying to access the redone (basically, the memory which is not allowed to access) which are mapped to this byte in shadow memory.
Your program trying to access the byte just next to terminating null character \0
of string "aaa"
and ultimately attempting to access redzone and, hence, the ASan reporting it.
The other post already shown the better implementation of strStr()
. I am leaving it up to you to fix the problem in your code and optimise the implementation of strStr()
function.
A suggestion:
When compiling program with -fsanitize=address
, enable the debugging information as well (e.g. -g
option of gcc
compiler) and you will get line number and proper stack in the ASan output.
1). Not sure why it's reporting heap-buffer-overflow
for accessing string literal beyond its size. On my system ASan output for same program giving error - global-buffer-overflow
, which seems more appropriate as the string literals usually allocated in data segment but where they will be placed can vary based on underlying platform/architecture.