Home > database >  Scanning binary files for a virus signature
Scanning binary files for a virus signature

Time:05-14

I'm trying to scan for a virus signature in a binary file (without using strstr()), and printing if the signature was found or not. But the code doesn't work. I read the files outside of the function.

Unfortunately the code doesn't work, even when I'm trying to scan the virus signature itself.

int searchForSignature(FILE* file_to_search, FILE* virus) {
    int v_size = 0;
    int f_size = 0;


    fseek(virus, 0L, SEEK_END);
    v_size = ftell(virus);
    fseek(virus, 0, SEEK_SET);
    fseek(file_to_search, 0L, SEEK_END);
    f_size = ftell(file_to_search);
    fseek(file_to_search, 0, SEEK_SET);

    printf("VIRUS SIZE: %d FILE SIZE: %d\n", v_size, f_size);

    if (v_size > f_size)
    {
        printf("VIRUS NOT FOUND\n");
        return 1;
    }

    int counter = 0;  char ch = ' '; char ch2 = ' ';
    while ((ch = (char)fgetc(file_to_search)) != EOF)
    {
        printf("%d\n", counter);
        if (counter == v_size)
        {
            printf("VIRUS FOUND\n");
            return 0;
        }
        else
        {
            ch2 = (char)fgetc(virus);
            if (ch == ch2)
            {
                counter  ;
            }
            else
            {
                counter = 0;
            }
        }
    }

    printf("VIRUS NOT FOUND\n");
    return 1;
}

how I read the files outside of the function:

FILE* virus_file = NULL;
virus_file = fopen("example file path", "rb");
if (virus_file == NULL)
{
    printf("Error opening file");
    return 1;
}

FILE* file_to_scan = NULL;
file_to_scan = fopen("example file path 2", "rb");
if (file_to_scan == NULL)
{
    printf("Error opening file");
    return 1;
}

CodePudding user response:

At least these problems

Success too late

counter == v_size may be true right after counter ;. There may not exist another character in the file for the next loop's if (counter == v_size). Instead test counter == v_size right after incrementing.

No reset

If only a partial match is found, match testing should not resume on the next source file character, but back up to the beginning of the compare, then advance 1.

Consider:

search file: "aaac"
virus:       "aac"

Searching search file fails on "aa(a)..." (the 3rd 'a'), yet needs to resume on "a(a)" (the 2nd 'a').

Virus file needs to rewind() on failed match. I rccoemnd to read the entire virus signature into allocated memory (e.g.: unsigned char *virus = malloc(v_size); rather than re-reading from a file.

Lost of information

int fgetc() returns 257 different values: 0-255 and EOF. Saving that result in a char loses information. Use int ch = fgetc(...), int ch2 = fgetc(...).

This can explain OP's "code doesnt working, even when Im trying to scan the virus signature itself." as one of the virus bytes is wrongly matches EOF when converted to char.

Casts are not needed to solve this.

Wrong type length

ftell(virus) returns a long. Saving the result in an int can lose information. Also code lacks error checks here.

Unneeded code

if (v_size > f_size) block is not needed.


Alternative

// Pseudo code
Read virus into an allocated buffer.
Read a chunk of test file into a test buffer at least 2x virus size.
Walk test buffer (1 char at a time) trying `memcmp()`. 
  Stop if match found,
  until remaining test buffer not full enough for a possible match.
Move remaining test buffer to start and re-fill as able.
If not able to re-fill at all, we are done - no match.

This is O(test_size * virus_size). A more sophisticated "walk" is O(test_size virus_size).

  • Related