Home > Net >  istream_iterator behavior misunderstanding
istream_iterator behavior misunderstanding

Time:09-17

The goal is to read 16 bit signed integers from a binary file. First, I open the file as an ifstream, then I would like to copy each numbers into a vector using istream_iterator and the copy algorithm. I dont' understand what's wrong with this snippet:

int main(int argc, char *argv[]) {
    std::string filename("test.bin");

    std::ifstream is(filename);
    if (!is) {
        std::cerr << "Error while opening input file\n";
        return EXIT_FAILURE;
    }
    
    std::noskipws(is);
    std::vector<int16_t> v;
    std::copy(
        std::istream_iterator<int16_t>(is),
        std::istream_iterator<int16_t>(),
        std::back_inserter(v)
    );

    //v is still empty
}

This code produces no error but the vector remains empty after the call to std::copy. Since I'm opening the file in the standard input mode ("textual" mode), I was expecting istream_iterator to work even if the file is binary. Of course there's something I'm missing about the behavior of this class.

CodePudding user response:

First off, to read a binary file with ifstream, you need to open the file in binary mode, not text mode (the default). Otherwise, read operations may mis-interpret linebreak bytes and translate them between platform encodings (ie, CRLF->LF, or vice versa), thus corrupting your binary data.

Second, istream_iterator uses operator>>, which reads and parses formatted text by default, which is not what you want when reading a binary file. You need to use istream::read() instead. However, there is no iterator wrapper for that (but you can write your own if needed).

Try this instead:

int main(int argc, char *argv[]) {
    std::string filename = "test.bin";

    std::ifstream is(filename, std::ifstream::binary);
    if (!is) {
        std::cerr << "Error while opening input file\n";
        return EXIT_FAILURE;
    }

    std::vector<int16_t> vec;
    int16_t value;

    while (is.read(reinterpret_cast<char*>(&value), sizeof(value))) {
        // swap value's endian, if needed...
        vec.push_back(value);
    }

    // use vec as needed...

    return 0;
}

That being said, if you really want to use istream_iterator for a binary file, then you would have to write a custom class/struct to wrap int16_t, and then define an operator>> for that type to call read(), eg:

struct myInt16_t {
    int16_t value; 
    operator int16_t() const { return value; }
};

std::istream& operator>>(std::istream &is, myInt16_t &v) {
    if (is.read(reinterpret_cast<char*>(&v.value), sizeof(v.value))) {
        // swap v.value's endian, if needed...
    }
    return is;
}

int main(int argc, char *argv[]) {
    std::string filename = "test.bin";

    std::ifstream is(filename, std::ifstream::binary);
    if (!is) {
        std::cerr << "Error while opening input file\n";
        return EXIT_FAILURE;
    }

    std::noskipws(is);
    std::vector<int16_t> vec;
    std::copy(
        std::istream_iterator<myInt16_t>(is),
        std::istream_iterator<myInt16_t>(),
        std::back_inserter(vec)
    );

    // use vec as needed...

    return 0;
}

CodePudding user response:

Since I'm opening the file in the standard input mode ("textual" mode), I was expecting istream_iterator to work even if the file is binary.

You conceptually have it the wrong way around. Since the file "is binary"1, you should not expect istream_iterator to work even though the file is opened "in text mode"2. The file's format determines what you can do with it; no tool can read the "numbers formatted as human-readable text" from the file unless the file is actually intended to be read that way. Your file is, presumably, intended to be read as "pairs of bytes, each of which represents a 16-bit numeric value", so you need tools that are compatible with that format. The file mode is only a small piece of the puzzle.

To iterate over the file meaningfully, you need to open it in binary mode (to avoid corruption on Windows platforms) and also use a tool that is capable of interpreting binary data in the way that you want. You also need to ensure that you are thinking about the data properly. Trying to use things like noskipws makes no sense because the data doesn't have a concept of whitespace, because it doesn't represent text.

  1. This doesn't really mean anything strictly speaking; but typically, saying that a file "is binary" suggests that the file contents are not intended to be human-readable, and that numeric values are represented as they would be in computer memory, i.e. directly in base 256, and not using the bytes to represent text that in turn uses Arabic numerals, . symbols etc. to represent a number.

  2. What it means to open a file "in text mode" depends on both the language you are using and the platform. In many languages (including C and C ), it has very little effect on Windows (basically just translating CR-LF sequences in place) and none (at least, last I checked) on Linux-like platforms. In some (like Python 3.x), it automatically brings in the machinery to convert bytes to objects representing actual text, actually using an encoding rather than pretending that bytes "are characters" (they are not).

  • Related