Home > Blockchain >  Rearranged vector elements after std::unique
Rearranged vector elements after std::unique

Time:09-24

I'm currently working through Stanley Lippman's C Primer. In Chapter 10 generic algorithms are introduced.

As an example std::sort, std::unique and the std::vector member function erase shall be used to remove duplicate elements within a vector.

To see how a vectors elements are rearranged by std::unique I tried to print every element just to find that not all elements are printed. However a call to .size() tells that the size of the vector is as expected unchanged.

After compiling the program:

clang   -std=c  11 -o elimDubs elimDubs.cc

and calling the programm with

./elimDubs the quick red fox jumps over the slow red turtle

The program prints

Size after std::unique: 10
fox jumps over quick red slow the turtle the  

which are only 9 of the 10 elements. (red is missing) Why? For the program it doesn't really matter as the subsequent call of erase is used to remove the duplicate elements anyways, still it irritates me that there are elements missing or at least not being printed.

#include <vector>
#include <string>
#include <iostream>
#include <algorithm>

void elimDubs( std::vector<std::string> &words )
{
  std::sort( words.begin(), words.end() );

  auto end_unique = std::unique( words.begin(), words.end() );


  std::cout << "Size after std::unique: "
            << words.size() << std::endl;

  for ( const auto &el : words )
    std::cout << el << " ";
  std::cout << std::endl;
}



int main(int argc, char **argv)
{
  std::vector<std::string> sentence;

  if ( argc < 2 )
    return -1;

  std::copy( argv   1, argv   argc,
             std::back_inserter(sentence) );

  elimDubs( sentence );
}

CodePudding user response:

std::unique is a destructive process. Quoting cppreference,

Removing is done by shifting the elements in the range in such a way that elements to be erased are overwritten.

This means that any elements after the new end iterator returned by std::unique are going to be in a valid but unspecified state. They aren't meant to be accessed as they should be removed from the vector with a call to erase.

This is also noted in the note section:

Iterators in [r, last) (if any), where r is the return value, are still dereferenceable, but the elements themselves have unspecified values. A call to unique is typically followed by a call to a container's erase member function, which erases the unspecified values and reduces the physical size of the container to match its new logical size.

CodePudding user response:

There's still 10 elements; it's just that one of them is "moved from". If you change your print loop to quote the words, thus:

  for ( const auto &el : words )
    std::cout << "'" << el << "'" << " ";

you'll see the following output:

'fox' 'jumps' 'over' 'quick' 'red' 'slow' 'the' 'turtle' 'the' ''
  • Related