Home > database >  Difference between using << operator with stringsteam and write member function
Difference between using << operator with stringsteam and write member function

Time:09-28

I have observed that when a uint8_t type buffer (not guaranteed to be null terminated) is read into a stringstream with the << operator using ss << buff.data, and the contained std::string is returned to Python, Python throws an error:

UnicodeDecodeError: 'utf-8' codec can't decode byte

But, if I use ss.write(buff.data, buff.size), the issue is not there.

I assume that this issue is because when using <<, there is a buffer overrun, and the data in ss might not be UTF-8 anymore. But when I do write(), I define the size and so there is no possibility of garbage data.

What is surprising is that if I do ss.write(buff.data, buff.size 1), I always observe a segfault. So I can't figure out how << can do a buffer overrun? Is there a fundamental difference between how both of these work, and so one triggers a segfault when it makes an illegal buffer access, and the other one does not? Or, is << just getting lucky?

CodePudding user response:

uint8_t is an alias for unsigned char. When operator<< is given an unsigned char* pointer, it is treated as a null-terminated string, same as a char* pointer. So, if your data is not actually a null-terminated character string, writing it to the stream using operator<< is undefined behavior. The code may crash. It may write garbage to the stream. There is no way to know.

write() doesn't care about a null terminator. It writes exactly as many bytes as you specify. That is why you don't have any trouble when using write() instead of operator<<.

  • Related