The code below consist of read_files()
that reads a bunch of text files and match()
function that does string matching against a pattern using the gnu regex library.
inside read_files()
i use getline()
with size
argument set to 0 so that getline()
will start with the default 120 size and then increased as needed
#include <limits.h> // for PATH_MAX
#include <regex.h> // for regcomp, regerror, regexec, regfree, size_t, REG...
#include <stdio.h> // for printf, fprintf, NULL, fclose, fopen, getline
#include <stdlib.h> // for exit, free, EXIT_FAILURE
int match(const char *regex_str, const char *str) {
regex_t regex;
int reti;
char msgbuf[100];
/* Compile regular expression */
reti = regcomp(®ex, regex_str, REG_EXTENDED);
if (reti) {
fprintf(stderr, "Could not compile regex\n");
exit(1);
}
/* Execute regular expression */
reti = regexec(®ex, str, 0, NULL, 0);
if (!reti) {
return 1;
} else if (reti == REG_NOMATCH) {
return 0;
} else {
regerror(reti, ®ex, msgbuf, sizeof(msgbuf));
fprintf(stderr, "Regex match failed: %s\n", msgbuf);
exit(1);
}
/* Free memory allocated to the pattern buffer by regcomp() */
regfree(®ex);
}
void read_files() {
size_t path_count = 2;
char pathnames[2][PATH_MAX] = {"./tmp/test0.conf", "./tmp/test1.conf"};
FILE *fp;
char *line = NULL;
size_t len = 0;
ssize_t read_count;
for (int i = 0; i < path_count; i ) {
printf("opening file %s\n", pathnames[i]);
fp = fopen(pathnames[i], "r");
if (fp == NULL) {
printf("internal error,couldn't open file %s\"}", pathnames[i]);
exit(EXIT_FAILURE);
}
int linenum=1;
while ((read_count = getline(&line, &len, fp)) != -1) {
printf("%d: %s",linenum,line);
linenum ;
}
printf("len: %zu\n", len);
fclose(fp);
// len=0; // this is the line that fixes the bug, if i reset len to 0 after reading the first file then everything works as expected, if i don't reset it then regex matching fails
if (line)
free(line);
}
}
int main(int argc, char *argv[]) {
read_files();
if (!match("^[a-zA-Z0-9] $", "jack")) {
printf("input don't match\n");
}
}
the content of test0.conf
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
the content of test1.conf
testing123
when running the above code i get this output:
opening file ./tmp/test0.conf
1: AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
len: 240
opening file ./tmp/test1.conf
1: testing123
len: 240
input don't match
so the pattern matching is failing with the string "jack" which in reality matches.
You can see that after finishing reading the first file that len
is set to 240 so when getline
gets executed again for the second file it will read the file with 240
buffer size, but this for some reason causes the regex matching to fail.
If i reset the len
to 0 argument after reading the first file then the code works as expected(the regex matching works fine).
So why does the getline()
len
parameter affect the behavior of the gnu regex?
CodePudding user response:
So why does the getline() len parameter affect the behavior of the gnu regex?
As Marian commented, you are using getline
incorrectly, causing it to corrupt heap. You can observe this by compiling the program with -fsanitize=address
flag and running it. See the Address Sanitizer manual to understand the error.
This is undefined behavior, and your program can do anything. Here it just happens to cause the GNU regex library to stop working correctly. A SIGSEGV
is another likely outcome.
To fix the problem, you should move the free
call out of the loop and only free the memory after you are done reading the lines.
Setting line = NULL
in the loop after you free
it is another possible (but less efficient) fix.