Get number of times values above 1 show up a dictionary-CodePudding

I'm working with a dictionary, trying to find all values (repetitons of words in a text) above 1 and store them into a list with this function :

def get_repetitions(text):
    n_grams_lengths = [1,2,3,4,5,6]
    ngrams_count = {}
    for n in n_grams_lengths:
        ngrams = tuple(nltk.ngrams(text.split(' '), n=n))
        ngrams_count.update({' '.join(i) : ngrams.count(i) for i in ngrams})
        reps_list = []            
        reps_variables = {values for (values) in ngrams_count.values() if values > 1}
        reps_list.append(reps_variables)
    return reps_list

When I do this, however, I get the list of values found in the dictionary, but not how many times they appear. How would I go about getting this?

Also, say the value "2" is in the dictionary 3 times, and the value "5", 4 times, would there be a way of getting something like this: 2,2,2,5,5,5,5?

CodePudding user response：

If 'text' is set to some str value,containing some text, then:

text=text.split()
result={i:text.count(i) for i in text if text.count(i)>1}

However, by default str.split() will separate the string with any whitespace characters. Depending on the text, this may not be as accurate as one would hope.

If you have a dictionary with words as keys and numbers of their occurrences as values, the solution to the second question can be done as follows:

result=' '.join(word for word in dictionary for _ in range(dictionary[word]))

CodePudding user response：

Your issue is that you already have a dictionary containing the words and their frequency, but you just extract the words themselves, ignoring the frequencies. Instead of doing that, you just need to filter ngrams_count:

ngrams_count = {"car": 5, "bob": 1, "foo": 3}

reps_variables = dict(filter(lambda elem: elem[1] > 1, ngrams_count.items()))
reps_variables
>>> {"car": 5, "foo": 3}

Then, for the second part of your question, we can do this:

frequencies = itertools.chain(*[[k] * v for k, v in reps_variables.items()])
frequencies
>>> ["car", "car", "car", "car", "car", "foo", "foo", "foo"]