Home > Mobile >  How to reserve a vector of strings, if string size is variable?
How to reserve a vector of strings, if string size is variable?

Time:09-01

I want to add many strings to a vector, and from what I've found, calling reserve() before this is more efficient. For a vector of ints, this makes sense because int is 4 bytes, so calling reserve(10) clearly reserves 40 bytes. I know the number of strings, which is about 60000. Should I call vector.reserve(60000)? How would the compiler know the size of my strings, as it doesn't know if these strings are of length 5 or 500?

CodePudding user response:

The compiler doesn't know the size of the strings, it knows the size of std::string object. Now, the size of std::string object does not depend on size of string. That is because - most of the time [1] - std::string will allocate on heap, so the object itself is only a pointer and length.

This also means, when you reserve the vector, you don't yet reserve memory for the strings. This is, however, not always a problem. std::strings come from somewhere: if the strings you receive are the return value of a function (i.e., you have them by value), then the memory is already allocated for the string (in the return value). Thus, e.g. std::swap() can help you speeding up populating the array with the results.

If however you populate it using passing references, then the callee will do the operations that result in alloc. In this case, you'd likely want to loop over the vector and reserve each string:

std::vector<std::string> v;
v.reserve(60000); // expected number of strings
for (auto& s : v) {
    s.reserve(500); // expected/max. size of strings
}

[1] In the past, it was often the case that std::string actually had a small, fixed-size buffer for sort strings and thus allocated only on heap when the string was longer than that. There was a debate on whether to allow it in the standardization group.

CodePudding user response:

Roughly speaking, std::string implementation consists of a pointer to the character buffer which represents the string. This character buffer is dynamically allocated on the heap (not always the case, refer to short string optimization). So it really doesn't matter how much space you reserve for the vector, because none of it will be utilized for the character buffer, and for every string that you add in the vector, the character buffer will be dynamically allocated, leaving the extra reserved space unused.

The size of the std::string class is known at compile time, and is equal to sizeof(std::string). In your case, you should just do something of the sort of v.reserve(n * sizeof(std::string)) if you are expecting to insert n strings into the vector v.

CodePudding user response:

I want to add many strings to a vector, and from what I've found, calling reserve() before this is more efficient.

If you know up front how many strings you want to store in the vector, then yes.

For a vector of ints, this makes sense because int is 4 bytes, so calling reserve(10) clearly reserves 40 bytes.

Yes, as it is allocating memory for sizeof(int) * 10 bytes.

I know the number of strings, which is about 60000. Should I call vector.reserve(60000)?

Yes.

How would the compiler know the size of my strings, as it doesn't know if these strings are of length 5 or 500?

The compiler doesn't need to know the length of the strings. Obviously, that is not known until runtime. However, that length doesn't change the compile-time size of the std::string class itself, which has a fixed layout and size. But one of its data members is a pointer to the actual character data, which is typically stored elsewhere in dynamic memory, thus is not counted toward the memory of the std::string object itself.

However, in the case of Short-String Optimization, the std::string class includes a small fixed buffer, which does count towards its fixed size at compile-time, and its data pointer will point at that buffer until the character data grows beyond the size of the buffer, then std::string will allocate dynamic memory to hold the larger character data. The SSO buffer still exists in the object, just unused at that point.

reserve() will allocate space only for the std::string objects themselves, not for any dynamic memory used for their character data. When a std::string object points at dynamic memory for its character data, that is irrelevant to the memory that std::vector allocates.

So yes, you would call reserve(60000) if you want to reserve space for 60000 std::string objects. That would allocate memory for sizeof(std::string) * 60000 bytes in the vector.

So, in general, reserve() allocates sizeof(vector::element_type) * capacity number of bytes. Then the vector creates instances of the element_type inside that memory as needed.

Or, in other words, when you want to pre-allocate memory for n number of elements, you ask reserve() to allocate memory for n number of elements. Period. The details of what those elements do internally is irrelevant to the vector. That is for the elements to handle on their own.

  • Related