I wrote a program that counts the number of occurrences of strings using a std::unordered_map
, then copies the values from the map to a std::vector
and calls std::sort
to sort them first by counts, then by string.
See the code below. Please, excuse the long list of strings. I couldn't quickly find a shorter list that would reproduce the behavior.
Compiler explorer concurs the segfault for GCC 10 and above.
Is this a compiler bug or am I missing something obvious here?
#include <unordered_map>
#include <vector>
#include <algorithm>
#include <string>
int main(int argc, char** argv) {
std::unordered_map<
std::string,
int
> terms;
std::vector<const char*> input {
"Head", "of", "Data", "News", "UK", "We", "are", "News", "UK", "is", "a",
"great", "company", "full", "talented", "and", "creative", "people", "We",
"are", "an", "organisation", "that", "holds", "journalism", "at", "its",
"very", "heart", "Our", "newspapers", "and", "digital", "products",
"include", "some", "of", "the", "most", "powerful", "media", "brands", "in",
"the", "English", "speaking", "world:", "the", "Times", "The", "Sunday",
"Times", "and", "The", "Sun", "reaching", "30", "million", "people", "each",
"week", "Despite", "differences", "in", "audience", "and", "content", "our",
"brands", "are", "united", "by", "a", "commitment", "to", "independent",
"journalism", "that", "connects", "our", "customers"
};
for (const char* str : input) {
terms[str];
}
std::vector<std::pair<std::string,int>> order;
order.reserve(terms.size());
for (const auto& [ term, n ] : terms)
order.emplace_back(term,n);
std::sort(order.begin(), order.end(),
[](const auto& a, const auto& b) -> bool {
if (a.second > b.second) return true;
return a.first < b.first;
}
);
}
Here's the output from gdb:
Program received signal SIGSEGV, Segmentation fault.
__memcmp_evex_movbe () at ../sysdeps/x86_64/multiarch/memcmp-evex-movbe.S:118
118 VMOVU_MASK (%rsi), %YMM2{%k2}
(gdb) bt
#0 __memcmp_evex_movbe () at ../sysdeps/x86_64/multiarch/memcmp-evex-movbe.S:118
#1 0x00007ffff7d4d7ad in std::char_traits<char>::compare (__n=<optimized out>, __s2=<optimized out>, __s1=<optimized out>)
at /usr/src/debug/gcc-build/x86_64-pc-linux-gnu/libstdc -v3/include/bits/char_traits.h:385
#2 std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >::compare (this=<optimized out>, __str=...)
at /usr/src/debug/gcc-build/x86_64-pc-linux-gnu/libstdc -v3/include/bits/basic_string.h:3148
#3 0x000055555555b408 in std::operator< <char, std::char_traits<char>, std::allocator<char> > (__lhs="Data", __rhs=<error: Cannot access memory at address 0x72656d6f74737563>)
at /usr/include/c /12.1.0/bits/basic_string.h:3694
#4 0x00005555555573e2 in operator()<std::pair<std::__cxx11::basic_string<char>, int>, std::pair<std::__cxx11::basic_string<char>, int> > (__closure=0x7fffffffd9f7, a={...}, b={...}) at test.cc:108
#5 0x000055555555744b in __gnu_cxx::__ops::_Val_comp_iter<main(int, char**)::<lambda(const auto:1&, const auto:2&)> >::operator()<std::pair<std::__cxx11::basic_string<char>, int>, __gnu_cxx::__normal_iterator<std::pair<std::__cxx11::basic_string<char>, int>*, std::vector<std::pair<std::__cxx11::basic_string<char>, int> > > >(std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, int> &, __gnu_cxx::__normal_iterator<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, int>*, std::vector<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, int>, std::allocator<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, int> > > >) (this=0x7fffffffd9f7, __val={...},
__it={first = <error: Cannot access memory at address 0x72656d6f74737563>, second = 2449}) at /usr/include/c /12.1.0/bits/predefined_ops.h:240
#6 0x0000555555557096 in std::__unguarded_linear_insert<__gnu_cxx::__normal_iterator<std::pair<std::__cxx11::basic_string<char>, int>*, std::vector<std::pair<std::__cxx11::basic_string<char>, int> > >, __gnu_cxx::__ops::_Val_comp_iter<main(int, char**)::<lambda(const auto:1&, const auto:2&)> > >(__gnu_cxx::__normal_iterator<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, int>*, std::vector<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, int>, std::allocator<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, int> > > >, __gnu_cxx::__ops::_Val_comp_iter<main(int, char**)::<lambda(const auto:1&, const auto:2&)> >) (__last={first = "", second = 3}, __comp=...) at /usr/include/c /12.1.0/bits/stl_algo.h:1789
#7 0x0000555555556c28 in std::__unguarded_insertion_sort<__gnu_cxx::__normal_iterator<std::pair<std::__cxx11::basic_string<char>, int>*, std::vector<std::pair<std::__cxx11::basic_string<char>, int> > >, __gnu_cxx::__ops::_Iter_comp_iter<main(int, char**)::<lambda(const auto:1&, const auto:2&)> > >(__gnu_cxx::__normal_iterator<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, int>*, std::vector<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, int>, std::allocator<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, int> > > >, __gnu_cxx::__normal_iterator<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, int>*, std::vector<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, int>, std::allocator<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, int> > > >, __gnu_cxx::__ops::_Iter_comp_iter<main(int, char**)::<lambda(const auto:1&, const auto:2&)> >) (__first={first = "an", second = 1}, __last={first = "", second = 0}, __comp=...) at /usr/include/c /12.1.0/bits/stl_algo.h:1830
#8 0x000055555555695b in std::__final_insertion_sort<__gnu_cxx::__normal_iterator<std::pair<std::__cxx11::basic_string<char>, int>*, std::vector<std::pair<std::__cxx11::basic_string<char>, int> > >, __gnu_cxx::__ops::_Iter_comp_iter<main(int, char**)::<lambda(const auto:1&, const auto:2&)> > >(__gnu_cxx::__normal_iterator<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, int>*, std::vector<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, int>, std::allocator<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, int> > > >, __gnu_cxx::__normal_iterator<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, int>*, std::vector<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, int>, std::allocator<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, int> > > >, __gnu_cxx::__ops::_Iter_comp_iter<main(int, char**)::<lambda(const auto:1&, const auto:2&)> >) (__first={first = "", second = 3}, __last={first = "", second = 0}, __comp=...) at /usr/include/c /12.1.0/bits/stl_algo.h:1850
#9 0x0000555555556806 in std::__sort<__gnu_cxx::__normal_iterator<std::pair<std::__cxx11::basic_string<char>, int>*, std::vector<std::pair<std::__cxx11::basic_string<char>, int> > >, __gnu_cxx::__ops::_Iter_comp_iter<main(int, char**)::<lambda(const auto:1&, const auto:2&)> > >(__gnu_cxx::__normal_iterator<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, int>*, std::vector<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, int>, std::allocator<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, int> > > >, __gnu_cxx::__normal_iterator<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, int>*, std::vector<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, int>, std::allocator<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, int> > > >, __gnu_cxx::__ops::_Iter_comp_iter<main(int, char**)::<lambda(const auto:1&, const auto:2&)> >) (__first={first = "", second = 3}, __last={first = "", second = 0}, __comp=...) at /usr/include/c /12.1.0/bits/stl_algo.h:1940
#10 0x0000555555556751 in std::sort<__gnu_cxx::__normal_iterator<std::pair<std::__cxx11::basic_string<char>, int>*, std::vector<std::pair<std::__cxx11::basic_string<char>, int> > >, main(int, char**)::<lambda(const auto:1&, const auto:2&)> >(__gnu_cxx::__normal_iterator<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, int>*, std::vector<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, int>, std::allocator<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, int> > > >, __gnu_cxx::__normal_iterator<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, int>*, std::vector<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, int>, std::allocator<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, int> > > >, struct {...}) (__first={first = "", second = 3}, __last={first = "", second = 0}, __comp=...)
at /usr/include/c /12.1.0/bits/stl_algo.h:4853
#11 0x000055555555665c in main (argc=1, argv=0x7fffffffdfc8) at test.cc:105
CodePudding user response:
[](const auto& a, const auto& b) -> bool {
if (a.second > b.second) return true;
return a.first < b.first;
}
This comparator function violates the requirement for being a strict weak ordering comparator. I am going to use two int
values for clarity, rather than an int and a string, because this makes the problem easier to understand, but the same thing happens if one of the values is a string. For the purposes of this explanation comparing two int values has the same semantics as comparing two std::string
s.
So, for example, with the following two values:
a={2, 4}
b={1, 2}
The first one will compare to be less than the second one. 4 > 2
and the comparator returns true.
a={1, 2}
b={2, 4}
Same thing here. 2 > 4
is false, but 1 < 2
is true, so the comparator returns true as well.
End result: here you have two values, and each one is less than the other one.
As Mr. Spock would say: this is not logical.
This is undefined behavior.
CodePudding user response:
As was pointed to the comparison does not satisfy the requirement of the strict weak ordering.
You can resolve the problem using the standard function std::tie
declared in the header <utility>
the following way
std::sort(order.begin(), order.end(),
[](const auto& a, const auto& b) -> bool {
return std::tie( b.second, a.first ) < std::tie( a.second, b.first );
} );