Home > Blockchain >  C : read int from binaryfile
C : read int from binaryfile

Time:10-30

I have pixels from an image which are stored in a binary file.

I would like to use a function to quickly read this file.

For the moment I have this:

std::vector<int> _data;    

std::ifstream file(_rgbFile.string(), std::ios_base::binary);
while (!file.eof())
{
    char singleByte[1];
    file.read(singleByte, 1);
    int b = singleByte[0];
    _data.push_back(b);
}
std::cout << "end" << std::endl;
file.close();

But on 4096 * 4096 * 3 images it already takes a little time.

Is it possible to optimize this function?

CodePudding user response:

You could make this faster by reading the whole file in one go, and preallocating the necessary storage in the vector beforehand:

std::ifstream file(_rgbFile.string(), std::ios_base::binary);
std::streampos posStart = file.tellg();
file.seekg(0, std::ios::end);
std::streampos posEnd = file.tellg();
file.seekg(posStart);

std::vector<char> _data;
_data.resize(posEnd - posStart, 0);
file.read(&_data[0], posEnd - posStart);
std::cout << "end" << std::endl;
file.close();

Avoiding unnecessary i/o

By reading the file as a whole in one read() call you can avoid a lot of read calls, and buffering of the ifstream. If the file is very large and you don't want to load it all in memory at once, then you can load smaller chunks of maybe a few MB each.

Also you avoid lots of functions calls - by reading it byte-by-byte you need to issue ifstream::read 50'331'648 times!

vector preallocation

std::vector grows dynamically when you try to insert new elements but no space is left. Each time the vector resizes, it needs to allocate a new, larger, memory area and copy all current elements in the vector over to the new location.
Most vector implementions choose a growth factor between 1.5 - 2, so each time the vector needs to resize it'll be a 1.5-2x larger allocation.

This can be completely avoided by calling std::vector::reserve or std::vector::resize. With these functions the vector memory only needs to be allocated once, with at least as many elements as you requested.

Godbolt example

Here's a godbolt example that shows the performance improvement.

testing a ~5MB file (4096*4096*3 bytes)

  • gcc 11.2, with optimizations disabled:
Old New
1300ms 16ms
  • gcc 11.2, -O3
Old New
878ms 13ms

Small bug in the code

As @TedLyngmo has pointed out your code also contains a small bug.
The EOF marker will only be set once you tried to read past the end of the file. see this question

So the last read that sets the EOF bit didn't actually read a byte, so you have one more byte in your array that contains uninitialized garbage.

You could fix this by checking for EOF directly after the read:

while(true) {
    char singleByte[1];
    file.read(singleByte, 1);
    if(file.eof()) break;
    int b = singleByte[0];
    _data.push_back(b);
}
  •  Tags:  
  • c
  • Related