Home > Mobile >  How to efficiently count word occurrences in Python without additional modules
How to efficiently count word occurrences in Python without additional modules

Time:10-12

Background

I'm working on a HackerRank problem Word Order. The task is to

  1. Read the following input from stdin

    4
    bcdef
    abcdefg
    bcde
    bcdef
    
  2. Produce the output that reflects:

    • Number of unique words in first line
    • Count of occurrences for each unique words Example:
    3       # Number of unique words
    2 1 1   # count of occurring words, 'bcdef' appears twice = 2
    

Problem

I've coded two solutions, the second one passes initial tests but fail due to exceeding time limit. First one would also work but I was unnecessarily sorting outputs (time limit issue would occur though).

Notes

  • In first solution I was unnecessarily sorting values, this is fixed in the second solution
  • I'm keen to be making better (proper) use of standard Python data structures, list/dictionary comprehension - I would be particularly keen to receive a solution that doesn't import any addittional modules, with exception of import os if needed.

Code

import os

def word_order(words):
    # Output no of distinct words
    distinct_words = set(words)
    n_distinct_words = len(distinct_words)
    print(str(n_distinct_words))
    
    # Count occurrences of each word
    occurrences = []
    
    for distinct_word in distinct_words:
        n_word_appearances = 0
        for word in words:
            if word == distinct_word:
                n_word_appearances  = 1
        occurrences.append(n_word_appearances)
    occurrences.sort(reverse=True)
    print(*occurrences, sep=' ')
    # for o in occurrences:
    #     print(o, end=' ')
    
def word_order_two(words):
    '''
    Run through all words and only count multiple occurrences, do the maths
    to calculate unique words, etc. Attempt to construct a dictionary to make
    the operation more memory efficient.
    '''
    # Construct a count of word occurrences
    dictionary_words = {word:words.count(word) for word in words}
    
    # Unique words are equivalent to dictionary keys
    unique_words = len(dictionary_words)
    
    # Obtain sorted dictionary values
    # sorted_values = sorted(dictionary_words.values(), reverse=True)
    result_values = " ".join(str(value) for value in dictionary_words.values())
    # Output results
    print(str(unique_words))
    print(result_values)
    return 0

if __name__ == '__main__':
    q = int(input().strip())

    inputs = []
    for q_itr in range(q):
        s = input()
        inputs.append(s)
        
    # word_order(words=inputs)
    word_order_two(words=inputs)

CodePudding user response:

Without any imports, you could count unique elements by

len(set(words))

and count their occurrences by

def counter(words):
    count = dict()
    for word in words:
        if word in count:
            count[word]  = 1
        else:
            count[word] = 1
    return count.values()

CodePudding user response:

Those nested loops are very bad performance wise (they make your algorithm quadratic) and quite unnecessary. You can get all counts in single iteration. You could use a plain dict or the dedicated collections.Counter:

from collections import Counter

def word_order(words):
    c = Counter(words)
    print(len(c))
    print(" ".join(str(v) for _, v in c.most_common()))

The "manual" implementation that shows the workings of the Counter and its methods:

def word_order(words):
    c = {}
    for word in words:
        c[word] = c.get(word, 0)   1
    print(len(c))
    print(" ".join(str(v) for v in sorted(c.values(), reverse=True)))
    # print(" ".join(map(str, sorted(c.values(), reverse=True))))

CodePudding user response:

You can use Counter then print output like below:

>>> from collections import Counter
>>> def counter_words(words):
...   cnt = Counter(words)
...   print(len(cnt))
...   print(*[str(v) for k,v in c.items()] , sep=' ')

>>> inputs = ['bcdef' , 'abcdefg' , 'bcde' , 'bcdef']
>>> counter_words(inputs)
3
2 1 1
  • Related