Home > Enterprise >  How to read big chunk of data using fread and access them as an array in C
How to read big chunk of data using fread and access them as an array in C

Time:10-22

I have a code with the goal of reading a binary file using fread and then processing the data and sending that to a char *buffer of size 1310720

For now (with my working code) I was reading file byte by byte for doing some processing then writing the byte 1 by 1 into an unsigned char array[i] I was using something like this

void read_input_file(FILE *stream,char *buffer)
{
    long i = 0;
    unsigned char tmp_buf[1310720] ;
    unsigned char *calc = malloc(1);
    while(i < 1310720)
    {
        fread(calc,1,1,stream);
        tmp_buf[i] = round(*calc /128);//this is an example
        i  ;
    }
    
    memcpy(buffer, &tmp_buf[0], 1310720);
}

it was working but now I want (for performance reasons) to read chunk of data that are the size of the buffer. So I tried using something like this :

void read_input_file(FILE *stream,char *buffer)
{
    long i = 0;
    unsigned char tmp_buf[1310720] ;
    unsigned char *calc = malloc(1310720);

    fread(calc,1310720,1,stream);

    while(i < 1310720)
    {
        tmp_buf[i] = round(calc[i] /128);//this is an example
        i  ;
    }
    
    memcpy(buffer, &tmp_buf[0], 1310720);
}

but whatever I do I got execution error where the program crash. I tried a lot of different things and code but nothing works.

So how do I read a big chunk of data into a buffer and then access it byte by byte to process the data ?

CodePudding user response:

Since you say you are working on Windows, the problem is what I outlined in my comment:

The stack size limit on Windows is 1024 KiB (1 MiB) and you're trying to create a 1280 KiB buffer on the stack. Use dynamic memory allocation twice. Even on Unix-like systems, the stack size limit is typically 8 MiB, so allocating that large of an array on the stack is a little dubious.

Because your tmp_buf variable is created as a local variable on the stack, you are blowing the stack on Windows. And, as

I think your function should return a status to indicate whether it was successful or not, so I've changed the return type to int and use 0 to indicate success and -1 for failure.

That means your code needs to become more like:

enum { BUFFER_SIZE = 1310720 };

int read_input_file(FILE *stream, char *buffer)
{
    long i = 0;
    unsigned char *temp = malloc(BUFFER_SIZE);
    unsigned char *calc = malloc(BUFFER_SIZE);

    if (temp == NULL || calc == NULL)
    {
        free(temp);   // Free both in case only one was allocated
        free(calc);
        return -1;
    }

    if (fread(calc, BUFFER_SIZE, 1, stream) != 1)
    {
        free(temp);
        free(calc);
        return -1;
    }


    for (size_t i = 0; i < BUFFER_SIZE; i  )
    {
        temp[i] = round(calc[i] / 128);  //this is an example
    }
    
    memcpy(buffer, temp, BUFFER_SIZE);
    free(temp);
    free(calc);
    return 0;
}

I would be happier if the length of buffer were passed to the function for checking, but that's a decision you can make. It isn't crucial. You could also use the length argument instead of the fixed size (now in the enumerated value) to make your code more general. However, you may not need that generality at the moment.

It also isn't clear that you can't do away with temp (aka tmp_buf) by using:

calc[i] = round(calc[i] / 128.0);

Note the conversion to a floating-point constant; otherwise, there's nothing to round as the division will be an integer division. However, if the non-example calculation uses other elements of calc than just calc[i], then you do need the second array to hold the results.

  • Related