Home > database >  Valgrind conditional jump ... error with PCRE2 JIT when reading from file
Valgrind conditional jump ... error with PCRE2 JIT when reading from file

Time:12-14

I have a very interesting problem.

I'd like to use PCRE2, and its JIT function. The task is simple: read lines from a file, and find patterns.

Here is the sample code:

#include <stdio.h>
#include <string.h>

#define PCRE2_CODE_UNIT_WIDTH 8
#include <pcre2.h>

int search(pcre2_code *re, unsigned char * subject) {

    pcre2_match_data *match_data_real = pcre2_match_data_create_from_pattern(re, NULL);

    size_t len_subject = strlen((const char *)subject);

    int rc = pcre2_match(
        re,
        (PCRE2_SPTR)subject,
        len_subject,
        0,
        0,
        match_data_real,
        NULL
    );

    pcre2_match_data_free(match_data_real);
    return rc;
}

int main(int argc, char ** argv) {

    unsigned char subject[][100]     = {
        "this is a foobar",
        "this is a barfoo",
        "this is a barbar",
        "this is a foofoo"
    };

    pcre2_code *re;
    PCRE2_SPTR  pattern = (unsigned char *)"foo";
    int         errornumber;
    PCRE2_SIZE  erroroffset;

    re = pcre2_compile(
        pattern,
        PCRE2_ZERO_TERMINATED,
        0,
        &errornumber,
        &erroroffset,
        NULL
    );
    pcre2_jit_compile(re, PCRE2_JIT_COMPLETE);

    FILE *fp;

    int s = 0;
    while(s < 2) {
        search(re, subject[s  ]);
    }

    if (argc >= 2) {
        fp = fopen(argv[1], "r");
        if (fp != NULL) {
            char tline[2048];
            while(fgets(tline, 2048, fp) != NULL) {
                search(re, (unsigned char *)tline);
            }
            fclose(fp);
        }
    }

    pcre2_code_free(re);

    return 0;
}

Compile the code:

gcc -Wall -O2 -g pcretest.c -o pcretest -lpcre2-8

As you can see, in line 58 I check if there is an argument given, the code tries to open it as a file.

Also as you can see in line 49, I'd like to use PCRE2's JIT.

The code works as well, but I checked it with Valgrind, and found an interesting behavior:

  • if I add a file as argument, then Valgrind reports Conditional jump or move depends on uninitialised value(s) and Uninitialised value was created by a stack allocation, but it points to the main(). The command: valgrind --tool=memcheck --leak-check=full --show-leak-kinds=all --track-origins=yes -s ./pcretest myfile.txt
  • Without argument, there is no any Valgrind report. Command: valgrind --tool=memcheck --leak-check=full --show-leak-kinds=all --track-origins=yes -s ./pcretest
  • if I comment out the pcre2_jit_compile((*re), PCRE2_JIT_COMPLETE); in line 55, then everything works as well, no any Valgrind reports. Command: valgrind --tool=memcheck --leak-check=full --show-leak-kinds=all --track-origins=yes -s ./pcretest myfile.txt

The Valgrind's relevant output:

==31385== Conditional jump or move depends on uninitialised value(s)
==31385==    at 0x4EECD1A: ???
==31385==    by 0x1FFEFFFC1F: ???
==31385==  Uninitialised value was created by a stack allocation
==31385==    at 0x1090FA: main (pcretest.c:27)
...
==31385== HEAP SUMMARY:
==31385==     in use at exit: 0 bytes in 0 blocks
==31385==   total heap usage: 12 allocs, 12 frees, 13,486 bytes allocated
==31385== 
==31385== All heap blocks were freed -- no leaks are possible
==31385== 
==31385== ERROR SUMMARY: 1 errors from 1 contexts (suppressed: 0 from 0)
==31385== 
==31385== 1 errors in context 1 of 1:
==31385== Conditional jump or move depends on uninitialised value(s)
==31385==    at 0x4EECD1A: ???
==31385==    by 0x1FFEFFFC1F: ???
==31385==  Uninitialised value was created by a stack allocation
==31385==    at 0x1090FA: main (pcretest.c:27)
==31385== 
==31385== ERROR SUMMARY: 1 errors from 1 contexts (suppressed: 0 from 0)

In line 27 there is the int main(...).

What do I miss?

CodePudding user response:

Observations:

  • The Valgrind report is telling you that the uninitialized data being accessed are in the stack frame of the initial call to main(). However,

  • even though you're compiling with debug information, the Valgrind report does not implicate a specific variable. Also,

  • the report's stack trace for the error does not present function names, and does not trace back to main(). And of course,

  • the error is not reported when you disable JIT compilation of the pattern.

Apparently, then, the error is associated with the machine code generated by PCRE2's JIT compiler. If you don't perform JIT compilation then you get correct operation via the ordinary matching path. If you do perform JIT compilation then the JIT-generated code is engaged, and that code triggers the Valgrind error. You might nevertheless get correct matching, but I would not rely on that for code that triggers the Valgrind error observed.

I played around with variations on your code, and discovered that the error is specifically associated with the calls to pcre2_match_data_create_from_pattern() and pcre2_match() in function search(). Either one will cause Valgrind to report the error. But why does the error occur only in some calls to search()?

It seems likely to be because the JIT compilation sets up data structures in main()'s stack frame that are clobbered by executing the body of the if (argc > 2) statement. This is consistent with the fact that I was able to avoid the error by adding an initializer for variable tline in that block:

            char tline[2048] = {0};

I can imagine a variety of scenarios for why that might make a difference, all having to do with how the JIT-generated code and the compiler-generated code manipulate the stack pointer.

Personally, discovering such an issue would likely persuade me to stay far away from PCRE's JIT compiler. Definitely I would do that at least until I had evidence of pattern matching being a performance hotspot for my program. If you must engage the JIT, however, then here are some recommendations that might (or might not) help you avoid trouble:

  1. Take "just in time" to heart: perform JIT as close as possible to when you actually use the pattern.

  2. Do not assume that the JIT code is long-term viable. In particular, it probably is unsafe to use after the function that calls the JIT compiler returns, but it might not be good even that long.

  3. Use the JIT-compiled regex (only) in the same function that runs the JIT compiler.

  4. Make that function as simple as possible.

  5. Declare all local variables of that function at the beginning, with initializers.

  6. Test thoroughly.

That's more than seems to have been necessary to resolve the issue for your particular example code, but it's aimed more generally at reducing the cross section for the compiled program violating assumptions made by the JIT.

  • Related