A dictionary, that both the keys and values are tuples (1-to-1 relationship). Keys are names. Values are IDs.
I want to find out, which names have more chances to come up with some IDs. For example, 'James' often appear with 'Gamma'. 'Harper' often comes with 'Delta' etc.
What I tried is basically to list the most frequent elements in the list of keys, and list of values. Then try to manually guess their likelihood.
d = {('Amelia', 'James', 'Noah'):('Iota', 'Epsilon', 'Gamma'),
('James', 'Lucas', 'Elijah'):('Beta', 'Theta', 'Eta'),
('Harper', 'Emma', 'Ava'):('Eta', 'Iota', 'Delta'),
('Harper', 'James', 'Amelia'):('Gamma', 'Delta', 'Epsilon'),
('Olivia', 'James', 'Liam'):('Zeta', 'Gamma', 'Eta'),
('Oliver', 'Charlotte', 'Evelyn'):('Iota', 'Alpha', 'Eta'),
('Elijah', 'Oliver', 'James'):('Gamma', 'Zeta', 'Epsilon'),
('Ethan', 'Harper', 'Emma'):('Alpha', 'Epsilon', 'Delta')}
getting_keys = list(d.keys())
# putting all elements in keys into a list
keys = [item for t in getting_keys for item in t]
# get a list of unique keys
unique_keys = set(keys)
# print the counts of occurrence of each unique key
for k in unique_keys:
print (k, keys.count(k))
getting_values = list(d.values())
# putting all elements in values into a list
values = [item for t in getting_values for item in t]
# get a list of unique values
unique_values = set(values)
# print the counts of occurance of each unique value
for v in unique_values:
print (v, values.count(v))
The printout shows 'James' appeared most often for 5 times, in keys. 'Gamma' appeared most often for 4 times, in values. So it can be concluded that 'James' often comes with 'Gamma'.
Any help please to suggest a better way to find out such? Thank you.
CodePudding user response:
Using collections.Counter()
we can generate a two-way map of counters between keys-values and values-keys, then this one-time prepared data can be used to query with a name and find out with which name it appeared most.
from collections import Counter
data = {('Amelia', 'James', 'Noah'):('Iota', 'Epsilon', 'Gamma'),
('James', 'Lucas', 'Elijah'):('Beta', 'Theta', 'Eta'),
('Harper', 'Emma', 'Ava'):('Eta', 'Iota', 'Delta'),
('Harper', 'James', 'Amelia'):('Gamma', 'Delta', 'Epsilon'),
('Olivia', 'James', 'Liam'):('Zeta', 'Gamma', 'Eta'),
('Oliver', 'Charlotte', 'Evelyn'):('Iota', 'Alpha', 'Eta'),
('Elijah', 'Oliver', 'James'):('Gamma', 'Zeta', 'Epsilon'),
('Ethan', 'Harper', 'Emma'):('Alpha', 'Epsilon', 'Delta')}
def prepare_values(data):
"""Prepare a counter both ways, key to value and value to key."""
relation_data = {}
for key, value in data.items():
for k in key:
for v in value:
relation_data.setdefault(k, Counter())[v] = 1
relation_data.setdefault(v, Counter())[k] = 1
return relation_data
def find_likelyhood(data, name, with_count=False):
max_value = max(data, key=lambda x: data[x][name])
if with_count:
return max_value, data[max_value][name]
return max_value
prepared_data = prepare_values(data)
print(find_likelyhood(prepared_data, 'James')) # Gamma
print(find_likelyhood(prepared_data, 'Harper')) # Delta
print(find_likelyhood(prepared_data, 'Delta', with_count=True)) # ('Harper', 3)