Home > OS >  How to find the most common name in 2 related lists
How to find the most common name in 2 related lists

Time:06-29

I would like to seek help from the community.. I have 2 related lists here:

names = ['alan_grant', 'alan_grant', 'alan_grant', 'alan_grant', 'alan_grant', 'claire_dearing', 'claire_dearing', 'claire_dearing', 'claire_dearing', 'claire_dearing', 'ellie_sattler', 'ellie_sattler', 'ellie_sattler', 'ellie_sattler', 'ellie_sattler', 'ian_malcolm', 'ian_malcolm', 'ian_malcolm', 'ian_malcolm', 'ian_malcolm', 'john_hammond', 'john_hammond', 'john_hammond', 'john_hammond', 'john_hammond', 'owen_grady', 'owen_grady', 'owen_grady', 'owen_grady', 'owen_grady']
votes = [True, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, True, True, True, True]

The votes list is a result of a facial recognition algorithm matching from the corresponding names list. Then I shall link each True vote to the corresponding name, and find the most frequently occurred name to be the final 'winner'.

I have tried 2 ways:

characters = {}
for name, vote in list(zip(names, votes)):
    if vote == True:
        characters[name] = characters.get(name, 0)   1
#print(characters)
print(max(characters, key=characters.get))

The output is 'owen_grady'

from collections import Counter

characters = [name for name, vote in list(zip(names, votes)) if vote == True]
#print(characters)
print(Counter(characters).most_common()[0][0])

The output is also 'owen_grady'. Which way is more efficient: Dictionary? or List Comprehension with Counter?

My ultimate question: is there another way (the most efficient) to get the result? I would like the output to be just 'owen_grady'

CodePudding user response:

You can try to stay with the Counter module solution in addition to the zip() function:

names = ['alan_grant', 'alan_grant', 'alan_grant', 'alan_grant', 'alan_grant', 'claire_dearing', 'claire_dearing', 'claire_dearing', 'claire_dearing', 'claire_dearing', 'ellie_sattler', 'ellie_sattler', 'ellie_sattler', 'ellie_sattler', 'ellie_sattler', 'ian_malcolm', 'ian_malcolm', 'ian_malcolm', 'ian_malcolm', 'ian_malcolm', 'john_hammond', 'john_hammond', 'john_hammond', 'john_hammond', 'john_hammond', 'owen_grady', 'owen_grady', 'owen_grady', 'owen_grady', 'owen_grady']
votes = [True, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, True, True, True, True]

from collections import Counter
import itertools

r = Counter(zip(names, votes))
for i in list(r.keys()):
    if i[1] == False:
        del r[i]
print(r)

Output:

Counter({('owen_grady', True): 4, ('alan_grant', True): 1})

CodePudding user response:

You can use itertools.compress() to filter all false entries. Option with Counter should be most efficient, just use n argument in .most_common() to let it return a single pair.

Code:

from itertools import compress
from collections import Counter

names = ['alan_grant', 'alan_grant', 'alan_grant', 'alan_grant', 'alan_grant', 'claire_dearing', 'claire_dearing', 'claire_dearing', 'claire_dearing', 'claire_dearing', 'ellie_sattler', 'ellie_sattler', 'ellie_sattler', 'ellie_sattler', 'ellie_sattler', 'ian_malcolm', 'ian_malcolm', 'ian_malcolm', 'ian_malcolm', 'ian_malcolm', 'john_hammond', 'john_hammond', 'john_hammond', 'john_hammond', 'john_hammond', 'owen_grady', 'owen_grady', 'owen_grady', 'owen_grady', 'owen_grady']
votes = [True, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, True, True, True, True]

most_common = Counter(compress(names, votes)).most_common(1)[0][0]
# Or with some syntax sugar:
# [(most_common, _)] = Counter(compress(names, votes)).most_common(1)

Upd. I've made some benchmarks and it seems like for this particular case slightly optimized first method demonstrates better performance:

from itertools import compress

names = ['alan_grant', 'alan_grant', 'alan_grant', 'alan_grant', 'alan_grant', 'claire_dearing', 'claire_dearing', 'claire_dearing', 'claire_dearing', 'claire_dearing', 'ellie_sattler', 'ellie_sattler', 'ellie_sattler', 'ellie_sattler', 'ellie_sattler', 'ian_malcolm', 'ian_malcolm', 'ian_malcolm', 'ian_malcolm', 'ian_malcolm', 'john_hammond', 'john_hammond', 'john_hammond', 'john_hammond', 'john_hammond', 'owen_grady', 'owen_grady', 'owen_grady', 'owen_grady', 'owen_grady']
votes = [True, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, True, True, True, True]

characters = list(compress(names, votes))
most_common = max(set(characters), key=characters.count)

You can help my country, check my profile info.

  • Related