Home > Software engineering >  How to display k least and most elements of a map?
How to display k least and most elements of a map?

Time:11-03

Lets say I have a map with a string and integer. After reading a text file into it, the strings contain all the different words and the integers contain the frequency of each word. I would like to display the k most common/least common words in this text file, but my map is sorting via the strings rather than integers. Here is some of my code:

typedef map<string, int> words;
ifstream file("input.txt");
string word;

while (file >> word)
{
    words[word]  ; 
}

for (WordMap::iterator w = words.begin(); w != words.end(); w  )
{
    cout << w->first << " = " << w->second << endl;
}

This code is working to get a list of words and the frequency of them all; however my output looks like this:

apple = 34 
cat = 4
dog = 12
...

How can I print k most frequent/least frequent words in this map? Is it because I have a string as the first member and an integer as the second member? If I have to reverse the placement of these two, then how would I add frequency count for each string?

CodePudding user response:

You can make a compare function, pass all the map's values using a container and sort it, then compare the last k elements.

bool compare (std::pair<std::string, int>& lhs, std::pair<std::string, int>& rhs)
{
    return lhs.second < rhs.second;
}

int main()
{
    std::vector<std::pair<std::string, int>> vec;

    for(const auto& pr: words)
        vec.push_back(make_pair(pr.first, pr.second));

    std::sort(vec.begin(), vec.end(), compare);
    
   for(const auto& pr : vec)
   {
       // print all the values out
       std::cout << pr.first << ' ' << pr.second << '\n';
   }
}

CodePudding user response:

There is a more or less standard approach for counting something in a container and then getting and showing its rank.

For the counting part we can use an associative container like a std::map or a std::unordered_map. And here we associate a "key", in this case the word, to a count, with a value, in this case the count of the specific word.

And luckily, and basically the reason for selecting such an approach is that the maps have a very nice index operator[]. This will look for the given key and, if found, return a reference to the value. If not found, then it will create a new entry with the key and return a reference to the new entry. So, in both cases, we will get a reference to the value used for counting. And then we can simply write:

std::unordered_map<char,int> counter{};
counter[word]  ;

And that looks really intuitive.

After this operation, you have already the frequency table. Either sorted by the key (the word), by using a std::map or unsorted, but faster accessible with a std::unordered_map.

In your case, where you are just interested in the sorted count, a std::unordered_map is advisable, because there is no need to sort data in a std::map by its key and later make no use of this sorting.

Then, you want to sort according to the frequency/count. Unfortunately, this is not possible with maps. Because a major property of a the map - container family is their reference to a key and not a value or count.

Therefore we need to use a second container, like a std::vector, or such, which we then can sort using std::sort for any given predicate, or, we can copy the values into a container, like a std::multiset that implicitly orders its elements. And because this is just a one liner, this is the recommended solution.

Additionally, because writing all these long names for the std containers, we create alias names, with the using keyword.

After we have the rank of the words, so, the list of words sorted by its count, we can use iterators and loops to access the data and output them.

We need to be careful that the container with the sorted values contains enough vales and then use a simple for loop, or a standard algorithm like std::copy_n.

Hot to limit the value? There are 3 potential solutions.

  • A handwritten comparison function against the size if (elementsToPrint > container.size()) elementsToPrint = container.size()
  • std::min, which does basically the above. So, elementsToPrint = std::min(elementsToPrint, container.size());
  • std::clamp. A dedicated function, defined for this purpose.Please read here about this function. Result would be: ````elementsToPrint = std::clamp(elementsToPrint,1, container.size());

Good. Now, we want to show the lowest k ranks and the highest k ranks. We can do this by using the an iterator pointing to the begin like begin() or,talking about the end, std::rbegin()

After all this, we now write ultra-compact code and fulfill the task with just a few lines of code:

#include <iostream>
#include <fstream>
#include <string>
#include <sstream>
#include <utility>
#include <set>
#include <unordered_map>
#include <type_traits>
#include <iomanip>
#include <algorithm>

// ------------------------------------------------------------
// Create aliases. Save typing work and make code more readable
using Pair = std::pair<std::string, unsigned int>;

// Standard approach for counter
using Counter = std::unordered_map<Pair::first_type, Pair::second_type>;

// Sorted values will be stored in a multiset
struct Comp { bool operator ()(const Pair& p1, const Pair& p2) const { return (p1.second == p2.second) ? p1.first<p2.first : p1.second>p2.second; } };
using Rank = std::multiset<Pair, Comp>;
// ------------------------------------------------------------

int main() {

    // Open file and check, if it can be opened.
    if (std::ifstream inputFileStream{ "input.txt" }; inputFileStream) {

        // Definition of our counter
        Counter counter;


        // Count -------------------------
        for (std::string word{}; inputFileStream >> word; counter[word]  );

        // Sort
        Rank rank(counter.begin(), counter.end());


        // Output ------------------------
        size_t elementsToShow = 3;

        // Show 3 biggest counts
        auto fromTop = rank.begin();
        for (size_t i{}; i < std::clamp(elementsToShow, 1u, rank.size());   i)
            std::cout << fromTop->first << '\t' << fromTop  ->second << '\n';

        // Show 3 smallest counts
        auto fromBottom = rank.rbegin();
        for (size_t i{}; i < std::clamp(elementsToShow, 1u, rank.size());   i)
            std::cout << fromBottom->first << '\t' << fromBottom  ->second << '\n';
    }
    else std::cerr << "\n*** Error. Could not open input file\n\n";
}

Often it is the case that you need to continue to work with the gathered count values. Then the use of std::partial_sort_copy is advisable. Basically we will at this time put all the counts into resulting vectors. Namely top and bottom.

Sorting criteria may also be adjusted in the lambda of the copy function.

Please see one possible solution below:

#include <iostream>
#include <fstream>
#include <string>
#include <utility>
#include <unordered_map>
#include <type_traits>
#include <algorithm>
#include <vector>

// ------------------------------------------------------------
// Create aliases. Save typing work and make code more readable
using Pair = std::pair<std::string, unsigned int>;

// Standard approach for counter
using Counter = std::unordered_map<Pair::first_type, Pair::second_type>;

// Resulting data
using Result = std::vector<Pair>;
// ------------------------------------------------------------

int main() {

    // Open file and check, if it can be opened.
    if (std::ifstream inputFileStream{ "input.txt" }; inputFileStream) {

        // Definition of our counter
        Counter counter;

        // Count -------------------------
        for (std::string word{}; inputFileStream >> word; counter[word]  );

        // Build result ------------------------
        unsigned int three = std::min(3u, counter.size());

        Result top(three);
        Result bottom(three);

        // Get 3 biggest counts
        std::partial_sort_copy(counter.begin(), counter.end(), top.begin(), top.end(),
            [](const Pair& p1, const Pair& p2) { return p1.second > p2.second; });
        
        // get 3 smallest counts
        std::partial_sort_copy(std::next(counter.begin(), counter.size() - three), counter.end(), bottom.begin(), bottom.end(),
            [](const Pair& p1, const Pair& p2) { return p1.second > p2.second; });

        // Output  ------------------------
        for (const auto& [word, count] : top) std::cout << word << '\t' << count << '\n';
        for (const auto& [word, count] : bottom) std::cout << word << '\t' << count << '\n';
    }
    else std::cerr << "\n*** Error. Could not open input file\n\n";
}
  • Related