I want to make a table of letter frequency, just like this
So, I have variable str
that contains some text, and I have map<char, int> m
, that contains number of occurrences of each letter.
So I have something like that:
vector<char> get_rank(const string& str)
{
map<char, int> m;
for (char i = 'a'; i <= 'z'; i )
m[i] = 0;
for (char ch : str)
m[ch] ;
for (auto& it : m)
v.push_back(it.first);
return v;
}
But map is ordered by keys, not by values. Is there a way to "overload" the comparator of map to order each element by value? If the answer is no, how can I implement the same idea but with the same elegancy. Because every idea I have seems a little dull and rough.
CodePudding user response:
It is not possible to sort a map with respect to its mapped values. The order of elements of a map is managed by the map according to its comparator which only considers keys.
However, you do not need a map to count frequencies when the keys are contiguous and of limited range.
Instead of a map, use a std::vector<letter_and_count>
with
struct letter_and_count {
char letter;
int count = 0;
};
Initialize the vector with elements with letter
initialized to a
till z
then increment v[ i - 'a']
when you encounter the letter i
in the string. Then use std::sort
to sort the vector according to the count
member.
PS: Note comments below. The letters a
till z
are not necessarily contiguous. However (idea also from eerorika) you can use a simple lookup table:
std::string alphabet{"abcdefghijklmnopqrstuvwxyz"};
Then first find the letter in that alphabet
and use its index in the alphabet for the counting.
CodePudding user response:
Is there a way to "overload" a comparator of map to order each element by value?
No.
If the answer is no, how can I implement the same idea but with the same elegancy.
Depends on use case. An elegant solution to the example is to sort the vector.
CodePudding user response:
Instead of trying to do something with a STL container that is clearly not designed for this (ordering a map) maybe use a different container.
You can always use std::pair<char, int>
as the type for an std::vector
.
These can be sorted.
I'd prefer std::vector<std::pair<char,int>>
simply because I'm more used to working with vectors but that's up to you.
CodePudding user response:
I find the approach ok to use specialized containers.
For counting a std::map
or a std::unordered_map
should be used. After that it can be copied into a std::vector
and sorted as you like.
This will work in a universal way. It is also possible to find the most frequent used element using a max heap.
Or, In the below example, we can get the x topmost elements. And this with just some few lines of code. The below will count nearly evrything and sortit according to the value:
#include <iostream>
#include <utility>
#include <unordered_map>
#include <algorithm>
#include <vector>
#include <iterator>
#include <type_traits>
// Helper for type trait We want to identify an iterable container ----------------------------------------------------
template <typename Container>
auto isIterableHelper(int) -> decltype (
std::begin(std::declval<Container&>()) != std::end(std::declval<Container&>()), // begin/end and operator !=
std::declval<decltype(std::begin(std::declval<Container&>()))&>(), // operator
void(*std::begin(std::declval<Container&>())), // operator*
void(), // Handle potential operator ,
std::true_type{});
template <typename T>
std::false_type isIterableHelper(...);
// The type trait -----------------------------------------------------------------------------------------------------
template <typename Container>
using is_iterable = decltype(isIterableHelper<Container>(0));
// Some Alias names for later easier reading --------------------------------------------------------------------------
template <typename Container>
using ValueType = std::decay_t<decltype(*std::begin(std::declval<Container&>()))>;
template <typename Container>
using Pair = std::pair<ValueType<Container>, size_t>;
template <typename Container>
using Counter = std::unordered_map<ValueType<Container>, size_t>;
// Function to get the k most frequent elements used in any Container ------------------------------------------------
template <class Container>
auto topKFrequent(const Container& data, size_t k) {
if constexpr (is_iterable<Container>::value) {
// Count all occurences of data
Counter<Container> counter{};
for (const auto& d : data) counter[d] ;
// For storing the top k
std::vector<Pair<Container>> top(k);
// Get top k
std::partial_sort_copy(counter.begin(), counter.end(), top.begin(), top.end(),
[](const std::pair<int, size_t >& p1, const std::pair<int, size_t>& p2) { return p1.second > p2.second; });
return top;
}
else
return data;
}
int main() {
std::vector testVector{ 1,2,2,3,3,3,4,4,4,4,5,5,5,5,6,6,6,6,6,7 };
for (const auto& p : topKFrequent(testVector, 2)) std::cout << "Value: " << p.first << " \t Count: " << p.second << '\n';
std::cout << '\n';
double cStyleArray[] = { 1.1, 2.2, 2.2, 3.3, 3.3, 3.3 };
for (const auto& p : topKFrequent(cStyleArray, 2)) std::cout << "Value: " << p.first << " \t Count: " << p.second << '\n';
std::cout << '\n';
std::string s{ "abbcccddddeeeeeffffffggggggg" };
for (const auto& p : topKFrequent(s, 2)) std::cout << "Value: " << p.first << " \t Count: " << p.second << '\n';
std::cout << '\n';
double value = 12.34;
std::cout << topKFrequent(value, 2) << "\n";
return 0;
}
The above code is for C 17
And below is an example to find the most frequent element in any container.
But this is a C 20 solution:
#include <iostream>
#include <utility>
#include <unordered_map>
#include <queue>
#include <vector>
#include <iterator>
#include <type_traits>
#include <string>
// Some Alias names for later easier reading --------------------------------------------------------------------------
template <typename Container>
using ValueType = std::ranges::range_value_t<Container>;
template <typename Container>
using Pair = std::pair<ValueType<Container>, size_t>;
template <typename Container>
using Counter = std::unordered_map<ValueType<Container>, size_t>;
template <typename Container>
using UnderlyingContainer = std::vector<Pair<Container>>;
// Predicate Functor
template <class Container> struct LessForSecondOfPair {
bool operator ()(const Pair<Container>& p1, const Pair<Container>& p2) { return p1.second < p2.second; }
};
template <typename Container>
using MaxHeap = std::priority_queue<Pair<Container>, UnderlyingContainer<Container>, LessForSecondOfPair<Container>>;
// Calculate max element ---------------------------------------------------------------------------------------------
template <class Container>
auto topFrequent(const Container& data) {
if constexpr (std::ranges::range<Container>) {
// Count all occurences of data
Counter<Container> counter{};
for (const auto& d : data) counter[d] ;
// Build a Max-Heap
MaxHeap<Container> maxHeap(counter.begin(), counter.end());
// Return most frequent number
return maxHeap.top().first;
}
else
return data;
}
// Test
int main() {
std::vector testVector{ 1,2,2,3,3,3,4,4,4,4,5,5,5,5,6,6,6,6,6,7 };
std::cout << "Most frequent is: " << topFrequent(testVector) << "\n";
double cStyleArray[] = { 1.1, 2.2, 2.2, 3.3, 3.3, 3.3 };
std::cout << "Most frequent is: " << topFrequent(cStyleArray) << "\n";
std::string s{ "abbcccddddeeeeeffffffggggggg" };
std::cout << "Most frequent is: " << topFrequent(s) << "\n";
double value = 12.34;
std::cout << "Most frequent is: " << topFrequent(value) << "\n";
return 0;
}