Home > database >  std::sort results in a segfault
std::sort results in a segfault

Time:06-24

I wrote a program that counts the number of occurrences of strings using a std::unordered_map, then copies the values from the map to a std::vector and calls std::sort to sort them first by counts, then by string.

See the code below. Please, excuse the long list of strings. I couldn't quickly find a shorter list that would reproduce the behavior.

Compiler explorer concurs the segfault for GCC 10 and above.

Is this a compiler bug or am I missing something obvious here?

#include <unordered_map>
#include <vector>
#include <algorithm>
#include <string>

int main(int argc, char** argv) {
  std::unordered_map<
    std::string,
    int
  > terms;

  std::vector<const char*> input {
"Head", "of", "Data", "News", "UK", "We", "are", "News", "UK", "is", "a",
"great", "company", "full", "talented", "and", "creative", "people", "We",
"are", "an", "organisation", "that", "holds", "journalism", "at", "its",
"very", "heart", "Our", "newspapers", "and", "digital", "products",
"include", "some", "of", "the", "most", "powerful", "media", "brands", "in",
"the", "English", "speaking", "world:", "the", "Times", "The", "Sunday",
"Times", "and", "The", "Sun", "reaching", "30", "million", "people", "each",
"week", "Despite", "differences", "in", "audience", "and", "content", "our",
"brands", "are", "united", "by", "a", "commitment", "to", "independent",
"journalism", "that", "connects", "our", "customers"
  };

  for (const char* str : input) {
      terms[str];
  }

  std::vector<std::pair<std::string,int>> order;
  order.reserve(terms.size());
  for (const auto& [ term, n ] : terms)
    order.emplace_back(term,n);

  std::sort(order.begin(), order.end(),
    [](const auto& a, const auto& b) -> bool {
      if (a.second > b.second) return true;
      return a.first < b.first;
    }
  );
}

Here's the output from gdb:

Program received signal SIGSEGV, Segmentation fault.
__memcmp_evex_movbe () at ../sysdeps/x86_64/multiarch/memcmp-evex-movbe.S:118
118     VMOVU_MASK (%rsi), %YMM2{%k2}
(gdb) bt
#0  __memcmp_evex_movbe () at ../sysdeps/x86_64/multiarch/memcmp-evex-movbe.S:118
#1  0x00007ffff7d4d7ad in std::char_traits<char>::compare (__n=<optimized out>, __s2=<optimized out>, __s1=<optimized out>)
    at /usr/src/debug/gcc-build/x86_64-pc-linux-gnu/libstdc  -v3/include/bits/char_traits.h:385
#2  std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >::compare (this=<optimized out>, __str=...)
    at /usr/src/debug/gcc-build/x86_64-pc-linux-gnu/libstdc  -v3/include/bits/basic_string.h:3148
#3  0x000055555555b408 in std::operator< <char, std::char_traits<char>, std::allocator<char> > (__lhs="Data", __rhs=<error: Cannot access memory at address 0x72656d6f74737563>)
    at /usr/include/c  /12.1.0/bits/basic_string.h:3694
#4  0x00005555555573e2 in operator()<std::pair<std::__cxx11::basic_string<char>, int>, std::pair<std::__cxx11::basic_string<char>, int> > (__closure=0x7fffffffd9f7, a={...}, b={...}) at test.cc:108
#5  0x000055555555744b in __gnu_cxx::__ops::_Val_comp_iter<main(int, char**)::<lambda(const auto:1&, const auto:2&)> >::operator()<std::pair<std::__cxx11::basic_string<char>, int>, __gnu_cxx::__normal_iterator<std::pair<std::__cxx11::basic_string<char>, int>*, std::vector<std::pair<std::__cxx11::basic_string<char>, int> > > >(std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, int> &, __gnu_cxx::__normal_iterator<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, int>*, std::vector<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, int>, std::allocator<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, int> > > >) (this=0x7fffffffd9f7, __val={...},
  __it={first = <error: Cannot access memory at address 0x72656d6f74737563>, second = 2449}) at /usr/include/c  /12.1.0/bits/predefined_ops.h:240
#6  0x0000555555557096 in std::__unguarded_linear_insert<__gnu_cxx::__normal_iterator<std::pair<std::__cxx11::basic_string<char>, int>*, std::vector<std::pair<std::__cxx11::basic_string<char>, int> > >, __gnu_cxx::__ops::_Val_comp_iter<main(int, char**)::<lambda(const auto:1&, const auto:2&)> > >(__gnu_cxx::__normal_iterator<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, int>*, std::vector<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, int>, std::allocator<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, int> > > >, __gnu_cxx::__ops::_Val_comp_iter<main(int, char**)::<lambda(const auto:1&, const auto:2&)> >) (__last={first = "", second = 3}, __comp=...) at /usr/include/c  /12.1.0/bits/stl_algo.h:1789
#7  0x0000555555556c28 in std::__unguarded_insertion_sort<__gnu_cxx::__normal_iterator<std::pair<std::__cxx11::basic_string<char>, int>*, std::vector<std::pair<std::__cxx11::basic_string<char>, int> > >, __gnu_cxx::__ops::_Iter_comp_iter<main(int, char**)::<lambda(const auto:1&, const auto:2&)> > >(__gnu_cxx::__normal_iterator<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, int>*, std::vector<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, int>, std::allocator<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, int> > > >, __gnu_cxx::__normal_iterator<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, int>*, std::vector<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, int>, std::allocator<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, int> > > >, __gnu_cxx::__ops::_Iter_comp_iter<main(int, char**)::<lambda(const auto:1&, const auto:2&)> >) (__first={first = "an", second = 1}, __last={first = "", second = 0}, __comp=...) at /usr/include/c  /12.1.0/bits/stl_algo.h:1830
#8  0x000055555555695b in std::__final_insertion_sort<__gnu_cxx::__normal_iterator<std::pair<std::__cxx11::basic_string<char>, int>*, std::vector<std::pair<std::__cxx11::basic_string<char>, int> > >, __gnu_cxx::__ops::_Iter_comp_iter<main(int, char**)::<lambda(const auto:1&, const auto:2&)> > >(__gnu_cxx::__normal_iterator<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, int>*, std::vector<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, int>, std::allocator<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, int> > > >, __gnu_cxx::__normal_iterator<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, int>*, std::vector<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, int>, std::allocator<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, int> > > >, __gnu_cxx::__ops::_Iter_comp_iter<main(int, char**)::<lambda(const auto:1&, const auto:2&)> >) (__first={first = "", second = 3}, __last={first = "", second = 0}, __comp=...) at /usr/include/c  /12.1.0/bits/stl_algo.h:1850
#9  0x0000555555556806 in std::__sort<__gnu_cxx::__normal_iterator<std::pair<std::__cxx11::basic_string<char>, int>*, std::vector<std::pair<std::__cxx11::basic_string<char>, int> > >, __gnu_cxx::__ops::_Iter_comp_iter<main(int, char**)::<lambda(const auto:1&, const auto:2&)> > >(__gnu_cxx::__normal_iterator<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, int>*, std::vector<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, int>, std::allocator<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, int> > > >, __gnu_cxx::__normal_iterator<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, int>*, std::vector<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, int>, std::allocator<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, int> > > >, __gnu_cxx::__ops::_Iter_comp_iter<main(int, char**)::<lambda(const auto:1&, const auto:2&)> >) (__first={first = "", second = 3}, __last={first = "", second = 0}, __comp=...) at /usr/include/c  /12.1.0/bits/stl_algo.h:1940
#10 0x0000555555556751 in std::sort<__gnu_cxx::__normal_iterator<std::pair<std::__cxx11::basic_string<char>, int>*, std::vector<std::pair<std::__cxx11::basic_string<char>, int> > >, main(int, char**)::<lambda(const auto:1&, const auto:2&)> >(__gnu_cxx::__normal_iterator<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, int>*, std::vector<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, int>, std::allocator<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, int> > > >, __gnu_cxx::__normal_iterator<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, int>*, std::vector<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, int>, std::allocator<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, int> > > >, struct {...}) (__first={first = "", second = 3}, __last={first = "", second = 0}, __comp=...)
    at /usr/include/c  /12.1.0/bits/stl_algo.h:4853
#11 0x000055555555665c in main (argc=1, argv=0x7fffffffdfc8) at test.cc:105

CodePudding user response:

    [](const auto& a, const auto& b) -> bool {
      if (a.second > b.second) return true;
      return a.first < b.first;
    }

This comparator function violates the requirement for being a strict weak ordering comparator. I am going to use two int values for clarity, rather than an int and a string, because this makes the problem easier to understand, but the same thing happens if one of the values is a string. For the purposes of this explanation comparing two int values has the same semantics as comparing two std::strings.

So, for example, with the following two values:

   a={2, 4}
   b={1, 2}

The first one will compare to be less than the second one. 4 > 2 and the comparator returns true.

   a={1, 2}
   b={2, 4}

Same thing here. 2 > 4 is false, but 1 < 2 is true, so the comparator returns true as well.

End result: here you have two values, and each one is less than the other one.

As Mr. Spock would say: this is not logical.

This is undefined behavior.

CodePudding user response:

As was pointed to the comparison does not satisfy the requirement of the strict weak ordering.

You can resolve the problem using the standard function std::tie declared in the header <utility> the following way

  std::sort(order.begin(), order.end(),
    [](const auto& a, const auto& b) -> bool {
        return std::tie( b.second, a.first ) < std::tie( a.second, b.first );
    } );
  • Related