Home > database >  Count words of interest in a list or array
Count words of interest in a list or array

Time:08-24

I have the following scenario:

import pandas as pd
import numpy as np

stuff = ['Elon Musk', 'elon musk', 'elon Musk', 'Elon Musk is awesome', "Who doesn't love Elon Musk"]

I'd like to count the times the name Elon Musk is shown in each aspect of the 'stuff' list. Upper or lower case would count. The expected result is that it would return a value count of 5 (since Elon Musk, case insensitive, appears in each aspect of the list.

CodePudding user response:

Something like this should work

l = ['Elon Musk', 'elon musk', 'elon Musk', 'Elon Musk is awesome', "Who doesn't love Elon Musk"]

sum(1 for x in l if 'elon musk' in x.lower())

Output

5

Edit:

In the event the word could be repeated you could use regex

import re
l = ['Elon Musk', 'elon musk', 'elon Musk', 'Elon Musk is awesome', "Who doesn't love Elon Musk loving Elon Musk"]

sum(len(re.findall('elon musk', x.lower())) for x in l)

Output

6

CodePudding user response:

To search a list, you can do the following:

results = [value for value in values if search.lower() in value.lower()]

To match this to your example, you can do

results = [result for result in stuff if 'elon musk' in result.lower()]

You can then use len(results) to get the number of results.

CodePudding user response:

Actually one of the fastest ways is to use a list comprehension (not an iterator):

>>> len([x for x in stuff if 'elon musk' in x.lower()])
5

# or
>>> sum(['elon musk' in x.lower() for x in stuff])
5

do not use sum('elon musk' in x.lower() for x in stuff) or any sum(iterator) when you can use a list comprehension itself. It works too, but it is actually slower (a tiny bit on such a small list).

CodePudding user response:

Assuming that you want to count multiple occurrences in the same sentence if there are any, you can use str.count:

stuff = ['Elon Musk', 'elon musk', 'elon Musk', 'Elon Musk is awesome', "Who doesn't love Elon Musk"]

elon_musk_counts = sum(s.lower().count('elon musk') for s in stuff)
  • Related