What actually is done when `string::c_str()` is invoked?-CodePudding

What actually is done when string::c_str() is invoked?

string::c_str() will allocate memory, copy the internal data of the string object and append a null-terminated character to the newly allocated memory?

Since string::c_str() must be O(1), so allocating memory and copying the string over is no longer allowed. In practice having the null-terminator there all the time is the only sane implementation.

Somebody in the comments of this answer of this question says that C 11 requires that std::string allocate an extra char for a trailing '\0'. So it seems the second option is possible.

And another person says that std::string operations - e.g. iteration, concatenation and element mutation - don't need the zero terminator. Unless you pass the string to a function expecting a zero terminated string, it can be omitted.

And more voice from an expert:

Why is it common for implementers to make .data() and .c_str() do the same thing?

Because it is more efficient to do so. The only way to make .data() return something that is not null terminated, would be to have .c_str() or .data() copy their internal buffer, or to just use 2 buffers. Having a single null terminated buffer always means that you can always use just one internal buffer when implementing std::string.

So I am really confused now, what actually is done when string::c_str() is invoked?

Update:

If c_str() is implemented as simply returning the pointer it's already allocated and managed.

A. Since c_str() must be null-terminated, the internal buffer needs to be always be null-terminated, even if for an empty std::string, e.g: std::string demo_str;, there should be a \0 in the internal memory of demo_str. Am I right?

B.What would happen when std::string::sub_str() is invoked? Automactically append a \0 to sub-string?

CodePudding user response：

Since C 11, std::string::c_str() and std::string::data() are both required to return a pointer to the string's internal buffer. And since c_str() (but not data()) must be null-terminated, that effectively requires the internal buffer to always be null-terminated, though the null terminator is not counted by size()/length(), or returned by std::string iterators, etc.

Prior to C 11, the behavior of c_str() was technically implementation-defined, but most implementations worked this way, as it is the simplest and sanest way to implement it. C 11 just standardized the behavior that was already in wide use.

CodePudding user response：

Here is an empirical "proof" that the complexity of .c_str() is o(1):

#include <stdio.h>
#include <string>
using namespace std;
int main(int argc, char **argv)
{
    std::string x(5000000, 'b'); // <--- single time allocation
    // std::string x(5, 'b'); // <--- compare to a much shorter string
    for (unsigned int i=0;i<1000000;i  )
    {
        const char *y = x.c_str(); // <--- copy entire content ?
    }
}

compiled with -O0 to avoid optimizing out anything
timing 2 versions: I get identical performance
this is an empirical "proof" that (at least my machine's implementation)
- extracts the internal representation of a null terminated string
- doesn't copy content every time .c_str() is called.