Home > Software engineering >  Python is there a way to sort a list of lists by frequency of a specific element?
Python is there a way to sort a list of lists by frequency of a specific element?

Time:09-24

I know the sort() function in python has a key element to it to allow to sort by a specification. However, is there a way to sort a list of words and their frequency in a list of sentences by a specific word? My list would take multiple sentences in a list then divide each sentence into own list, making everything lower case and removing punctuation, then putting the frequency of the word next to it.

For example my list would be given a list like:

['Hello world! My name is Mary, However', 'Is the water running? Is it cold?', 'Everything is is is okay.']

And it would be transformed into:

[ {'hello': 1, 'world': 1, 'my': 1, 'name': 1, 'is': 1, 'mary': 1, 'however': 1}, {'is': 2, 'the': 1, 'water': 1, 'running': 1, 'it': 1, 'cold': 1} {'everything': 1, 'is': 3, 'okay': 1} ]

In this scenario I would want to sort the list of sentences by the frequency of the word 'is'. How could I go about that without changing the word lists?

CodePudding user response:

You can uae Counter:

from collections import Counter

data = ['Hello world! My name is Mary, However', 'Is the water running? Is it cold?', 'Everything is is is okay.']

sorted_counters = sorted([dict(Counter(sentence.lower().split(' '))) for sentence in data], key = lambda x: x.get('is', 0))

print(sorted_counters)
# [{'hello': 1, 'world!': 1, 'my': 1, 'name': 1, 'is': 1, 'mary,': 1, 'however': 1}, {'is': 2, 'the': 1, 'water': 1, 'running?': 1, 'it': 1, 'cold?': 1}, {'everything': 1, 'is': 3, 'okay.': 1}]

Keep in mind that this code doesn't remove all non alphanumeric characters. If you want to remove them, you can use the following util function:

def replace_non_alphanumeric(word):
    return re.sub('[^0-9a-zA-Z] ', '', word)

and apply it on every element inside sentence.lower().split(' ')

CodePudding user response:

First of all, we need to generate the dictionaries. So we need to take each sentence, remove any punctuation, and split it into words:

sentances = ['Hello world! My name is Mary, However', 'Is the water running? Is it cold?', 'Everything is is is okay.']

processed_sentences = []
for sentence in sentences:
    sentence = sentence.replace("!", "").replace(",", "").replace("?", "").replace(".", "")
    sentence = sentence.lower()
    sentence_words = sentence.split(" ")
    processed_sentences.append(sentence_words)

Then we need to count each word. Python has a construct to handle that, called collections.Counter.

from collections import Counter

counted_sentences = []
for sentence_list in processed_sentences:
    counted_words = Counter(sentence_list)
    counted_sentences.append(counted_words)

Then, we need to sort the list. Python lists handily have a sort method, but we need to specify how we want the lists sorted. We do that using a key:

counted_sentences.sort(key=lambda cw: cw.get("is", 0))

Note that list.sort sorts the list in place, so we're not assigning it to anything.

Then we just need to clean up a little to get your desired output:

result = []
for counted_sentence in counted_sentences:
    result.append(dict(counted_sentence))

print(result)

And there you go.

  • Related