I'm using small files currently for testing and will scale up once it works.
I made a file bigFile.txt
that has:
ABCDEFGHIJKLMNOPQRSTUVWXYZ
I'm running this to segment the data that is being read from the file:
#include <iostream>
#include <fstream>
#include <memory>
using namespace std;
int main()
{
ifstream file("bigfile.txt", ios::binary | ios::ate);
cout << file.tellg() << " Bytes" << '\n';
ifstream bigFile("bigfile.txt");
constexpr size_t bufferSize = 4;
unique_ptr<char[]> buffer(new char[bufferSize]);
while (bigFile)
{
bigFile.read(buffer.get(), bufferSize);
// print the buffer data
cout << buffer.get() << endl;
}
}
This gives me the following result:
26 Bytes
ABCD
EFGH
IJKL
MNOP
QRST
UVWX
YZWX
Notice how in the last line after 'Z' the character 'WX' is repeated again?
How do I get rid of it so that it stops after reaching the end?
CodePudding user response:
cout << buffer.get()
uses the const char*
overload, which prints a NULL-terminated C string.
But your buffer isn't NULL-terminated, and istream::read()
can read less characters than the buffer size. So when you print buffer
, you end up printing old characters that were already there, until the next NULL character is encountered.
Use istream::gcount()
to determine how many characters were read, and print exactly that many characters. For example, using std::string_view
:
#include <iostream>
#include <fstream>
#include <memory>
#include <string_view>
using namespace std;
int main()
{
ifstream file("bigfile.txt", ios::binary | ios::ate);
cout << file.tellg() << " Bytes" << "\n";
file.seekg(0, std::ios::beg); // rewind to the beginning
constexpr size_t bufferSize = 4;
unique_ptr<char[]> buffer = std::make_unique<char[]>(bufferSize);
while (file)
{
file.read(buffer.get(), bufferSize);
auto bytesRead = file.gcount();
if (bytesRead == 0) {
// EOF
break;
}
// print the buffer data
cout << std::string_view(buffer.get(), bytesRead) << endl;
}
}
Note also that there's no need to open the file again - you can rewind the original one to the beginning and read it.
CodePudding user response:
The problem is that you don't override the buffer's content. Here's what your code does:
- It reads the beginning of the file
- When reaching the 'YZ', it reads it and only overrides the buffer's first two characters ('U' and 'V') because it has reached the end of the file.
One easy fix is to clear the buffer before each file read:
#include <iostream>
#include <fstream>
#include <array>
int main()
{
std::ifstream bigFile("bigfile.txt", std::ios::binary | std::ios::ate);
int fileSize = bigFile.tellg();
std::cout << bigFile.tellg() << " Bytes" << '\n';
bigFile.seekg(0);
constexpr size_t bufferSize = 4;
std::array<char, bufferSize> buffer;
while (bigFile)
{
for (int i(0); i < bufferSize; i)
buffer[i] = '\0';
bigFile.read(buffer.data(), bufferSize);
// Print the buffer data
std::cout.write(buffer.data(), bufferSize) << '\n';
}
}
I also changed:
- The
std::unique_ptr<char[]>
to astd::array
since we don't need dynamic allocation here andstd::arrays
's are safer that C-style arrays - The printing instruction to
std::cout.write
because it caused undefined behavior (see @paddy's comment).std::cout <<
prints a null-terminated string (a sequence of characters terminated by a'\0'
character) whereasstd::cout.write
prints a fixed amount of characters - The second file opening to a call to the
std::istream::seekg
method (see @rustyx's answer).
Another (and most likely more efficient) way of doing this is to read the file character by character, put them in the buffer, and printing the buffer when it's full. We then print the buffer if it hasn't been already in the main for
loop.
#include <iostream>
#include <fstream>
#include <array>
int main()
{
std::ifstream bigFile("bigfile.txt", std::ios::binary | std::ios::ate);
int fileSize = bigFile.tellg();
std::cout << bigFile.tellg() << " Bytes" << '\n';
bigFile.seekg(0);
constexpr size_t bufferSize = 4;
std::array<char, bufferSize> buffer;
int bufferIndex;
for (int i(0); i < fileSize; i)
{
// Add one character to the buffer
bufferIndex = i % bufferSize;
buffer[bufferIndex] = bigFile.get();
// Print the buffer data
if (bufferIndex == bufferSize - 1)
std::cout.write(buffer.data(), bufferSize) << '\n';
}
// Override the characters which haven't been already (in this case 'W' and 'X')
for ( bufferIndex; bufferIndex < bufferSize; bufferIndex)
buffer[bufferIndex] = '\0';
// Print the buffer for the last time if it hasn't been already
if (fileSize % bufferSize /* != 0 */)
std::cout.write(buffer.data(), bufferSize) << '\n';
}