Home > Blockchain >  How to re-inititate read after reaching EOF during stream decompression with Boost:iostreams?
How to re-inititate read after reaching EOF during stream decompression with Boost:iostreams?

Time:01-19

I am trying to realize a streaming de-compressor with Boost:iostreams that could work with incomplete compressed files (the size of the uncompressed file is known before the decompression starts). Basically, I run the compressor and decompressor simultaneously and since compressor is slower than decompressor, decompressor reaches the end of file. I am trying to reset the stream to re-initiate the read operation but I could not realize it. gcount() still returns 0 after clear() and seekg(0). My ultimate goal is to realize a mechanism that would continue from the point where the end of file is reached, instead of returning to the beginning. But, I cannot even return to the beginning of the file.

I would appreciate any kind of support. Thank you in advance.

#include <iostream>
#include <fstream>
#include <sstream>
#include <string>

#include <boost/iostreams/filtering_streambuf.hpp>
#include <boost/iostreams/copy.hpp>
#include <boost/iostreams/filter/gzip.hpp>
#include <boost/iostreams/filtering_stream.hpp>

const std::size_t bufferSize = 1024;
const std::size_t testDataSize = 13019119616; 

int main() {

    // Decompress
    std::ofstream outStream("image_boost_decompressed.img", std::ios_base::out);
    std::ifstream inStream("image_boost_compressed.img.gz", std::ios_base::in | std::ios_base::binary);
    
    boost::iostreams::filtering_istream out;
    out.push(boost::iostreams::gzip_decompressor());
    out.push(inStream);

    char buf[bufferSize] = {};

    std::cout << "Decompression started!" << std::endl;

    std::size_t loopCount = 0;
    std::size_t decompressedDataSize = 0;

    while(decompressedDataSize < testDataSize) {
        std::cout << "cursor bef: " << inStream.tellg() << std::endl; 

        out.read(buf, bufferSize);

        std::cout << "read size: " << out.gcount() << std::endl;
        std::cout << "cursor after: " << inStream.tellg() << std::endl; 

        if (out.gcount() > 0) {
            outStream.write(buf, out.gcount());
            decompressedDataSize = decompressedDataSize   out.gcount();
        } else if (out.gcount() == 0) {
            std::cout << "clear initiated!" << std::endl;
            inStream.clear();
            inStream.seekg(0)
        }
        std::cout << "----------------" << std::endl;
    }

    std::cout << "Decompression ended!" << std::endl;
    std::cout << "decompressed data size: " << decompressedDataSize << std::endl;
    outStream.close();

    return 0;
}


CodePudding user response:

Basically, I run the compressor and decompressor simultaneously and since compressor is slower than decompressor, decompressor reaches the end of file

In your code you're NOT running a compressor. It's not the slowness of the compressor that causes your program to see EOF. Instead, your EOF is caused by the fact that you actually reach the end of the file.

This means you have a race-condition where you access the file early.

  1. If your aim is to use the "file" only as a temporary station during fully streaming operations, the usual way to approach this is to use a (named) pipe (FIFO on POSIX platforms) instead of a file.

  2. If you cannot do that, the simplest fix is to make sure you start processing files only when they are complete. The usual way to accomplish this is by doing "transactional" uploads (meaning to upload into a temporary location, and then rename the file into place only after completion).

Both ways, your program will correctly see the EOF only when the writing side closes their end.

I have some examples of similar approaches on this site, e.g. this networked example that pipes to zcat to do the streaming decompression.

CodePudding user response:

If you want to pick up where you left off, then use seekg(0, std::ios_base::cur). It works:

#include <iostream>
#include <fstream>

int main() {
    std::ofstream out("test.out");
    out << "line 1\n";
    out.flush();
    std::ifstream in("test.out");
    char line[256];
    in.read(line, sizeof(line));
    line[in.gcount()] = 0;
    std::cout << line;
    if (in.eof())
        std::cout << "-- at eof\n";
    out << "line 2\n";
    out.flush();
    in.clear();
    if (in.good())
        std::cout << "-- now good!\n";
    in.seekg(0, std::ios_base::cur);
    in.read(line, sizeof(line));
    line[in.gcount()] = 0;
    std::cout << line;
    in.close();
    out.close();
}
  • Related