What actually is done when string::c_str()
is invoked?
string::c_str()
will allocate memory, copy the internal data of the string object and append a null-terminated character to the newly allocated memory?
or
- Since
string::c_str()
must be O(1), so allocating memory and copying thestring
over is no longer allowed. In practice having the null-terminator there all the time is the only sane implementation.
Somebody in the comments of this answer of this question says that C 11 requires that std::string
allocate an extra char
for a trailing '\0'
. So it seems the second option is possible.
And another person says that std::string
operations - e.g. iteration, concatenation and element mutation - don't need the zero terminator. Unless you pass the string
to a function expecting a zero terminated string, it can be omitted.
And more voice from an expert:
Why is it common for implementers to make .data() and .c_str() do the same thing?
Because it is more efficient to do so. The only way to make .data() return something that is not null terminated, would be to have .c_str() or .data() copy their internal buffer, or to just use 2 buffers. Having a single null terminated buffer always means that you can always use just one internal buffer when implementing std::string.
So I am really confused now, what actually is done when string::c_str()
is invoked?
Update:
If c_str()
is implemented as simply returning the pointer it's already allocated and managed.
A. Since c_str()
must be null-terminated, the internal buffer needs to be always be null-terminated, even if for an empty std::string, e.g: std::string demo_str
;, there should be a \0
in the internal memory of demo_str
. Am I right?
B.What would happen when std::string::sub_str()
is invoked? Automactically append a \0
to sub-string?
CodePudding user response:
Since C 11, std::string::c_str()
and std::string::data()
are both required to return a pointer to the string's internal buffer. And since c_str()
(but not data()
) must be null-terminated, that effectively requires the internal buffer to always be null-terminated, though the null terminator is not counted by size()
/length()
, or returned by std::string
iterators, etc.
Prior to C 11, the behavior of c_str()
was technically implementation-defined, but most implementations worked this way, as it is the simplest and sanest way to implement it. C 11 just standardized the behavior that was already in wide use.
CodePudding user response:
Here is an empirical "proof" that the complexity of .c_str()
is o(1):
#include <stdio.h>
#include <string>
using namespace std;
int main(int argc, char **argv)
{
std::string x(5000000, 'b'); // <--- single time allocation
// std::string x(5, 'b'); // <--- compare to a much shorter string
for (unsigned int i=0;i<1000000;i )
{
const char *y = x.c_str(); // <--- copy entire content ?
}
}
- compiled with
-O0
to avoid optimizing out anything - timing 2 versions: I get identical performance
- this is an empirical "proof" that (at least my machine's implementation)
- extracts the internal representation of a null terminated string
- doesn't copy content every time
.c_str()
is called.