I'm currently working through Stanley Lippman's C Primer. In Chapter 10 generic algorithms are introduced.
As an example std::sort
, std::unique
and the std::vector
member function erase
shall be used to remove duplicate elements within a vector.
To see how a vectors elements are rearranged by std::unique I tried to print every element just to find that not all elements are printed. However a call to .size()
tells that the size of the vector is as expected unchanged.
After compiling the program:
clang -std=c 11 -o elimDubs elimDubs.cc
and calling the programm with
./elimDubs the quick red fox jumps over the slow red turtle
The program prints
Size after std::unique: 10
fox jumps over quick red slow the turtle the
which are only 9 of the 10 elements. (red
is missing)
Why? For the program it doesn't really matter as the subsequent call of erase
is used to remove the duplicate elements anyways, still it irritates me that there are elements missing or at least not being printed.
#include <vector>
#include <string>
#include <iostream>
#include <algorithm>
void elimDubs( std::vector<std::string> &words )
{
std::sort( words.begin(), words.end() );
auto end_unique = std::unique( words.begin(), words.end() );
std::cout << "Size after std::unique: "
<< words.size() << std::endl;
for ( const auto &el : words )
std::cout << el << " ";
std::cout << std::endl;
}
int main(int argc, char **argv)
{
std::vector<std::string> sentence;
if ( argc < 2 )
return -1;
std::copy( argv 1, argv argc,
std::back_inserter(sentence) );
elimDubs( sentence );
}
CodePudding user response:
std::unique
is a destructive process. Quoting cppreference,
Removing is done by shifting the elements in the range in such a way that elements to be erased are overwritten.
This means that any elements after the new end iterator returned by std::unique
are going to be in a valid but unspecified state. They aren't meant to be accessed as they should be removed from the vector with a call to erase
.
This is also noted in the note section:
Iterators in
[r, last)
(if any), wherer
is the return value, are still dereferenceable, but the elements themselves have unspecified values. A call tounique
is typically followed by a call to a container'serase
member function, which erases the unspecified values and reduces the physical size of the container to match its new logical size.
CodePudding user response:
There's still 10 elements; it's just that one of them is "moved from". If you change your print loop to quote the words, thus:
for ( const auto &el : words )
std::cout << "'" << el << "'" << " ";
you'll see the following output:
'fox' 'jumps' 'over' 'quick' 'red' 'slow' 'the' 'turtle' 'the' ''