I want to add many string
s to a vector
, and from what I've found, calling reserve()
before this is more efficient. For a vector
of int
s, this makes sense because int
is 4 bytes, so calling reserve(10)
clearly reserves 40 bytes. I know the number of strings, which is about 60000. Should I call vector.reserve(60000)
? How would the compiler know the size of my string
s, as it doesn't know if these string
s are of length 5 or 500?
CodePudding user response:
The compiler doesn't know the size of the strings, it knows the size of std::string
object. Now, the size of std::string
object does not depend on size of string. That is because - most of the time [1] - std::string
will allocate on heap, so the object itself is only a pointer and length.
This also means, when you reserve the vector, you don't yet reserve memory for the strings. This is, however, not always a problem. std::string
s come from somewhere: if the strings you receive are the return value of a function (i.e., you have them by value), then the memory is already allocated for the string (in the return value). Thus, e.g. std::swap()
can help you speeding up populating the array with the results.
If however you populate it using passing references, then the callee will do the operations that result in alloc. In this case, you'd likely want to loop over the vector and reserve each string:
std::vector<std::string> v;
v.reserve(60000); // expected number of strings
for (auto& s : v) {
s.reserve(500); // expected/max. size of strings
}
[1] In the past, it was often the case that std::string
actually had a small, fixed-size buffer for sort strings and thus allocated only on heap when the string was longer than that. There was a debate on whether to allow it in the standardization group.
CodePudding user response:
Roughly speaking, std::string
implementation consists of a pointer to the character buffer which represents the string. This character buffer is dynamically allocated on the heap (not always the case, refer to short string optimization). So it really doesn't matter how much space you reserve for the vector, because none of it will be utilized for the character buffer, and for every string that you add in the vector, the character buffer will be dynamically allocated, leaving the extra reserved space unused.
The size of the std::string
class is known at compile time, and is equal to sizeof(std::string)
. In your case, you should just do something of the sort of v.reserve(n * sizeof(std::string))
if you are expecting to insert n
strings into the vector v
.
CodePudding user response:
I want to add many
string
s to avector
, and from what I've found, callingreserve()
before this is more efficient.
If you know up front how many strings you want to store in the vector, then yes.
For a
vector
ofint
s, this makes sense becauseint
is 4 bytes, so callingreserve(10)
clearly reserves 40 bytes.
Yes, as it is allocating memory for sizeof(int) * 10
bytes.
I know the number of strings, which is about 60000. Should I call
vector.reserve(60000)
?
Yes.
How would the compiler know the size of my
string
s, as it doesn't know if thesestring
s are of length 5 or 500?
The compiler doesn't need to know the length of the strings. Obviously, that is not known until runtime. However, that length doesn't change the compile-time size of the std::string
class itself, which has a fixed layout and size. But one of its data members is a pointer to the actual character data, which is typically stored elsewhere in dynamic memory, thus is not counted toward the memory of the std::string
object itself.
However, in the case of Short-String Optimization, the std::string
class includes a small fixed buffer, which does count towards its fixed size at compile-time, and its data pointer will point at that buffer until the character data grows beyond the size of the buffer, then std::string
will allocate dynamic memory to hold the larger character data. The SSO buffer still exists in the object, just unused at that point.
reserve()
will allocate space only for the std::string
objects themselves, not for any dynamic memory used for their character data. When a std::string
object points at dynamic memory for its character data, that is irrelevant to the memory that std::vector
allocates.
So yes, you would call reserve(60000)
if you want to reserve space for 60000 std::string
objects. That would allocate memory for sizeof(std::string) * 60000
bytes in the vector
.
So, in general, reserve()
allocates sizeof(vector::element_type) * capacity
number of bytes. Then the vector
creates instances of the element_type
inside that memory as needed.
Or, in other words, when you want to pre-allocate memory for n
number of elements, you ask reserve()
to allocate memory for n
number of elements. Period. The details of what those elements do internally is irrelevant to the vector
. That is for the elements to handle on their own.