Home > other >  Using temporary files for working with large amount of data
Using temporary files for working with large amount of data

Time:10-19

I am trying to implement interthread communication using a temporary serialized file (pointer table containing [no, offset, length] in data file and data file containing data). One thread should receive data (process it) and save it to memory. The second thread should read data from memory and display results. (The input thread is only appending data and the output thread is only reading.)

I must compile it as 32-bit, so I try to solve the 2 GB limit by reading/writing a temporary file.

I have implemented a simple example. But the problem is, that if I/O threads are working simultaneously, then the output thread does not read properly. If the input thread writes and closes the file, then the output thread reads and closes it works fine. I have done synchronization with shared_mutex and mutex with the same bad results.

Thank you in advance for your responses.

UPDATE: Behavior gets better by resetting flags (stream.clear()) according to Update ifstream object after data is written to its file using ofstream but still sometimes it fails and sometimes pass.

Main:

int main() {        
      //Start input and output job
      std::thread input = std::thread(inputJob);
      std::thread output = std::thread(outputJob);
        
      //Wait here for end
      input.join();
      output.join();
    
    //HERE checking results
    
      return 0;
}

Input job:

void inputJob() {
    is_on = true;
    //Loading input data
    for (int i = 1; i < 10; i  ) {
        student s("George", "Patton", 100000   i, (i % 2) > 0, "0A");
        for (int j = 1; j < 4; j  ) s.subjects.push_back(subject(rand(), "MA", (j % 5)));
        s1.push_back(s);
    }

    //Save to binary file
    std::fstream data_stream, table_stream;
    table_stream.open("./table.data", std::fstream::out | std::fstream::trunc | std::fstream::binary);
    data_stream.open("./data.data", std::fstream::out | std::fstream::trunc | std::fstream::binary);
    size_t off = 0;
    if (table_stream.is_open() && data_stream.is_open()) for (size_t i = 0; i < s1.size(); i  ) {
        std::string tmp = s1[i].toBinaryString();
        size_t sz = tmp.size();
        table_row t(i, off, sz);
        off  = sz;
        {
            std::lock_guard lock(m);
            table_stream << t.toBinaryString();
            data_stream << tmp;
            cout << "Written" << endl;
        }       
    }
    table_stream.close();
    data_stream.close();
    is_on = false;
}

output job:

    void outputJob() {
    //Load from binary file
    std::fstream data_stream, table_stream;
    table_stream.open("./table.data", std::fstream::in | std::fstream::binary);
    data_stream.open("./data.data", std::fstream::in | std::fstream::binary);
    if (table_stream.is_open() && data_stream.is_open()) {
        size_t row_sz = sizeof(table_row);
        std::string line = "";
        size_t index = 0;

            unsigned table_r = 0;
            bool was_empty = false;
            while (is_on || (was_empty == false)) {
                {
                    std::lock_guard lock(m);
                    if (tryGetData(table_stream, line, row_sz, table_r) == 0 && line.empty() == false) {
                        table_r  = row_sz;
                        was_empty = false;
                        table_row row;
                        index = 0;
                        row.fromBinaryString(line, index, line.size());

                        if (tryGetData(data_stream, line, row.len, row.off) == 0 && line.empty() == false) {
                            was_empty = false;
                            index = 0;
                            student tmp;
                            tmp.fromBinaryString(line, index);
                            if (VAL_CHECK == 1) s2.push_back(tmp);
                        }
                        else was_empty = true;
                    }
                    else was_empty = true;
                }
            }
        
    }
    table_stream.close();
    data_stream.close();
}

Try get data function:

    int tryGetData(std::fstream &data_stream, std::string& data, size_t data_sz, size_t offset) {
    int ret = 0;
    data = "";
        if (data_stream.is_open()) {
            //Set ptr
            if (offset != UINT32_MAX) data_stream.seekp(offset);

            char c;
            while (data_sz > 0 && data_stream.get(c)) {
                data.push_back(c);
                data_sz--;
            }
            if (data_stream.eof()) ret = 1;
        }
    return ret;
}

CodePudding user response:

The simplest solution I can think of is using double-buffering. Have two of each file type, and make sure the input thread is writing into one pair while the output thread is always reading the other pair.

All other solutions I can think of would require making sure that the OS or the file library is not doing any caching on the file and thus would be platform-specific. But if that's not a problem, read up on memory mapped files, for example.

CodePudding user response:

The main problem is, that writing thread may start later that reading thread. If reading thread is waiting for opening file by writing thread, then it works like charm.

  • Related