Faster way of loading (big) std::vector<std::vector<float>> from file-CodePudding

I have implemented a way to save a std::vector of vectors to file and read them using this code (found here on stackoverflow):

Saving:

void saveData(std::string path)
{
    std::ofstream FILE(path, std::ios::out | std::ofstream::binary);

    // Store size of the outer vector
    int s1 = RecordData.size();
    FILE.write(reinterpret_cast<const char*>(&s1), sizeof(s1));

    // Now write each vector one by one
    for (auto& v : RecordData) {
        // Store its size
        int size = v.size();
        FILE.write(reinterpret_cast<const char*>(&size), sizeof(size));

        // Store its contents
        FILE.write(reinterpret_cast<const char*>(&v[0]), v.size() * sizeof(float));
    }
    FILE.close();
}

Reading:

void loadData(std::string path)
{
    std::ifstream FILE(path, std::ios::in | std::ifstream::binary);

    if (RecordData.size() > 0) // Clear data
    {
        for (int n = 0; n < RecordData.size(); n  )
            RecordData[n].clear();
        RecordData.clear();
    }

    int size = 0;
    FILE.read(reinterpret_cast<char*>(&size), sizeof(size));
    RecordData.resize(size);
    for (int n = 0; n < size;   n) {
        int size2 = 0;
        FILE.read(reinterpret_cast<char*>(&size2), sizeof(size2));
        float f;
        //RecordData[n].reserve(size2); // This doesn't make a difference in speed
        for (int k = 0; k < size2;   k) {
            FILE.read(reinterpret_cast<char*>(&f), sizeof(f));
            RecordData[n].push_back(f);
        }
    }
}

This works perfectly, but loading for a big dataset (980MB, size 32000 for inner vectors and 1600 of those) takes ~7-8 seconds (in contrast to saving, which is done in under 1 sec.). Since I can see memory-usage in Visual Studio going up slowly during loading, my guess would be a lot of memory allocations. The commented out line RecordData[n].resize(size2); doesn't make a difference, though.

Can anybody give me a faster way of loading this kind of data? My first try was putting all the data in one big std::vector<float> but that for some reason seemed to give some kind of overflow (which shouldn't happen, because sizeof(int) = 4, so ~4 billion, should be enough for an index variable (does std::vector use somehing else internally?)). Also it would be really nice to have a data-structure of std::vector<std::vector<float>>. In the future I will have to handle way bigger datasets (altough I will probably use <short> for that to save memory and handle it as a fixed-point-number), so loading-speeds will be more significant...

Edit:

I should point out, that 32000 for the inner vector and 1600 for the outer vector is just an example. Both can vary. I think, I would have to save an "index-vector" as the first inner vector to declare the number of items for the rest (like I said in a comment: I'm a first-time file-reader/-writer and haven't used std::vector for more than I week or two, so I'm not sure about that). I will look into block-reading and post the result in a later edit...

Edit2:

So, here is the version of perivesta (thank you for that). The only change I made is discarding RV& RecordData because this is a global variable for me.

Curiously this brings my loading time down only from ~7000ms to ~1500ms for a 980 GB file, not 7429ms to 644 ms for a 2 GB file for perivesta (strange, how different speeds differ on different systems ;-) )

void loadData2(std::string path)
{
    std::ifstream FILE(path, std::ios::in | std::ifstream::binary);

    if (RecordData.size() > 0) // Clear data
    {
        for (int n = 0; n < RecordData.size(); n  )
            RecordData[n].clear();
        RecordData.clear();
    }

    int size = 0;
    FILE.read(reinterpret_cast<char*>(&size), sizeof(size));
    RecordData.resize(size);
    for (auto& v : RecordData) {
        // load its size
        int size2 = 0;
        FILE.read(reinterpret_cast<char*>(&size2), sizeof(size2));
        v.resize(size2);

        // load its contents
        FILE.read(reinterpret_cast<char*>(&v[0]), v.size() * sizeof(float));
    }
}

CodePudding user response：

First of all, since you know the number of elements up front, you should reserve space in your vector to prevent unnecessary reallocations as the vector grows. Secondly, all those push_backs are probably costing you. That function does have some overhead. And thirdly, as Alan says, reading the entire file all in one go can't possibly hurt, which you can do if you resize (as opposed to reserve) the vector first.

So, with all that said, I would do this (once you have read the size of the data into size2):

RecordData.resize(size2);                // both reserves and allocates space for size2 items
FILE.read(reinterpret_cast<char*>(RecordData.data()), size2 * sizeof(float));

I would think this is optimal.

It's unfortunate in cases like this, IMO, that std::vector insists on zero-initialising all size2 elements when you call resize since you're immediately going to overwrite them, but I don't know of an easy easy to prevent this. You'd need to get into custom allocators, and it's probably not worth the effort.

CodePudding user response：

This is an implementation of Alan Birtles' comment: When reading, read an inner vector with one single FILE.read call instead of many individual ones. This reduces the time dramatically on my system:

These are the results for a 2GB file:

Writing    took 2283 ms
Reading v1 took 7429 ms
Reading v2 took 644 ms

Here is the code that produces this output:

#include <vector>
#include <iostream>
#include <string>
#include <chrono>
#include <random>
#include <fstream>

using RV = std::vector<std::vector<float>>;

void saveData(std::string path, const RV& RecordData)
{
    std::ofstream FILE(path, std::ios::out | std::ofstream::binary);

    // Store size of the outer vector
    int s1 = RecordData.size();
    FILE.write(reinterpret_cast<const char*>(&s1), sizeof(s1));

    // Now write each vector one by one
    for (auto& v : RecordData) {
        // Store its size
        int size = v.size();
        FILE.write(reinterpret_cast<const char*>(&size), sizeof(size));

        // Store its contents
        FILE.write(reinterpret_cast<const char*>(&v[0]), v.size() * sizeof(float));
    }
    FILE.close();
}

//original version for comparison
void loadData1(std::string path, RV& RecordData)
{
    std::ifstream FILE(path, std::ios::in | std::ifstream::binary);

    if (RecordData.size() > 0) // Clear data
    {
        for (int n = 0; n < RecordData.size(); n  )
            RecordData[n].clear();
        RecordData.clear();
    }

    int size = 0;
    FILE.read(reinterpret_cast<char*>(&size), sizeof(size));
    RecordData.resize(size);
    for (int n = 0; n < size;   n) {
        int size2 = 0;
        FILE.read(reinterpret_cast<char*>(&size2), sizeof(size2));
        float f;
        //RecordData[n].resize(size2); // This doesn't make a difference in speed
        for (int k = 0; k < size2;   k) {
            FILE.read(reinterpret_cast<char*>(&f), sizeof(f));
            RecordData[n].push_back(f);
        }
    }
}

//my version
void loadData2(std::string path, RV& RecordData)
{
    std::ifstream FILE(path, std::ios::in | std::ifstream::binary);

    if (RecordData.size() > 0) // Clear data
    {
        for (int n = 0; n < RecordData.size(); n  )
            RecordData[n].clear();
        RecordData.clear();
    }

    int size = 0;
    FILE.read(reinterpret_cast<char*>(&size), sizeof(size));
    RecordData.resize(size);
    for (auto& v : RecordData) {
        // load its size
        int size2 = 0;
        FILE.read(reinterpret_cast<char*>(&size2), sizeof(size2));
        v.resize(size2);

        // load its contents
        FILE.read(reinterpret_cast<char*>(&v[0]), v.size() * sizeof(float));
    }
}

int main()
{
    using namespace std::chrono;
    const std::string filepath = "./vecdata";
    const std::size_t sizeOuter = 16000;
    const std::size_t sizeInner = 32000;
    RV vecSource;
    RV vecLoad1;
    RV vecLoad2;

    const auto tGen1 = steady_clock::now();
    std::cout << "generating random numbers..." << std::flush;
    std::random_device dev;
    std::mt19937 rng(dev());
    std::uniform_real_distribution<float> dis;
    for(int i = 0; i < sizeOuter;   i)
    {
        RV::value_type inner;
        for(int k = 0; k < sizeInner;   k)
        {
            inner.push_back(dis(rng));
        }
        vecSource.push_back(inner);
    }
    const auto tGen2 = steady_clock::now();

    std::cout << "done\nSaving..." << std::flush;
    const auto tSave1 = steady_clock::now();
    saveData(filepath, vecSource);
    const auto tSave2 = steady_clock::now();

    std::cout << "done\nReading v1..." << std::flush;
    const auto tLoadA1 = steady_clock::now();
    loadData1(filepath, vecLoad1);
    const auto tLoadA2 = steady_clock::now();
    std::cout << "verifying..." << std::flush;
    if(vecSource != vecLoad1) std::cout << "FAILED! ...";

    std::cout << "done\nReading v2..." << std::flush;
    const auto tLoadB1 = steady_clock::now();
    loadData2(filepath, vecLoad2);
    const auto tLoadB2 = steady_clock::now();
    std::cout << "verifying..." << std::flush;
    if(vecSource != vecLoad2) std::cout << "FAILED! ...";


    std::cout << "done\nResults:\n" <<
        "Generating took " << duration_cast<milliseconds>(tGen2 - tGen1).count() << " ms\n" <<
        "Writing    took " << duration_cast<milliseconds>(tSave2 - tSave1).count() << " ms\n" <<
        "Reading v1 took " << duration_cast<milliseconds>(tLoadA2 - tLoadA1).count() << " ms\n" <<
        "Reading v2 took " << duration_cast<milliseconds>(tLoadB2 - tLoadB1).count() << " ms\n" <<
        std::flush;
}

CodePudding user response：

//RecordData[n].resize(size2); // This doesn't make a difference in speed

If you use this line (while not changing the rest of the code) one should expect the code to be slower, not faster!

resize changes the size of the vector and then you push more elements to it, resulting in a vector of double the size you actually need.

I suppose you wanted reserve instead. reserve only allocates capacity without changing the size of the vector and then pushing elements can be expected to be faster, because memory is only allocated once.

Alternatively use resize and then assign to already existing elements.