Home > Net >  Elements' relationship from a dictionary, that both keys & values are tuples
Elements' relationship from a dictionary, that both keys & values are tuples

Time:11-10

A dictionary, that both the keys and values are tuples (1-to-1 relationship). Keys are names. Values are IDs.

I want to find out, which names have more chances to come up with some IDs. For example, 'James' often appear with 'Gamma'. 'Harper' often comes with 'Delta' etc.

enter image description here

What I tried is basically to list the most frequent elements in the list of keys, and list of values. Then try to manually guess their likelihood.

d = {('Amelia', 'James', 'Noah'):('Iota', 'Epsilon', 'Gamma'),
('James', 'Lucas', 'Elijah'):('Beta', 'Theta', 'Eta'),
('Harper', 'Emma', 'Ava'):('Eta', 'Iota', 'Delta'),
('Harper', 'James', 'Amelia'):('Gamma', 'Delta', 'Epsilon'),
('Olivia', 'James', 'Liam'):('Zeta', 'Gamma', 'Eta'),
('Oliver', 'Charlotte', 'Evelyn'):('Iota', 'Alpha', 'Eta'),
('Elijah', 'Oliver', 'James'):('Gamma', 'Zeta', 'Epsilon'),
('Ethan', 'Harper', 'Emma'):('Alpha', 'Epsilon', 'Delta')}


getting_keys = list(d.keys())

# putting all elements in keys into a list
keys = [item for t in getting_keys for item in t]

# get a list of unique keys
unique_keys = set(keys)

# print the counts of occurrence of each unique key
for k in unique_keys:
    print (k, keys.count(k))


getting_values = list(d.values())

# putting all elements in values into a list
values = [item for t in getting_values for item in t]

# get a list of unique values
unique_values = set(values)

# print the counts of occurance of each unique value
for v in unique_values:
    print (v, values.count(v))

The printout shows 'James' appeared most often for 5 times, in keys. 'Gamma' appeared most often for 4 times, in values. So it can be concluded that 'James' often comes with 'Gamma'.

Any help please to suggest a better way to find out such? Thank you.

CodePudding user response:

Using collections.Counter() we can generate a two-way map of counters between keys-values and values-keys, then this one-time prepared data can be used to query with a name and find out with which name it appeared most.

from collections import Counter

data = {('Amelia', 'James', 'Noah'):('Iota', 'Epsilon', 'Gamma'),
('James', 'Lucas', 'Elijah'):('Beta', 'Theta', 'Eta'),
('Harper', 'Emma', 'Ava'):('Eta', 'Iota', 'Delta'),
('Harper', 'James', 'Amelia'):('Gamma', 'Delta', 'Epsilon'),
('Olivia', 'James', 'Liam'):('Zeta', 'Gamma', 'Eta'),
('Oliver', 'Charlotte', 'Evelyn'):('Iota', 'Alpha', 'Eta'),
('Elijah', 'Oliver', 'James'):('Gamma', 'Zeta', 'Epsilon'),
('Ethan', 'Harper', 'Emma'):('Alpha', 'Epsilon', 'Delta')}


def prepare_values(data):
    """Prepare a counter both ways, key to value and value to key."""
    relation_data = {}
    for key, value in data.items():
        for k in key:
            for v in value:
                relation_data.setdefault(k, Counter())[v]  = 1
                relation_data.setdefault(v, Counter())[k]  = 1

    return relation_data


def find_likelyhood(data, name, with_count=False):
    max_value = max(data, key=lambda x: data[x][name])
    if with_count:
        return max_value, data[max_value][name]
    return max_value


prepared_data = prepare_values(data)
print(find_likelyhood(prepared_data, 'James')) # Gamma
print(find_likelyhood(prepared_data, 'Harper')) # Delta
print(find_likelyhood(prepared_data, 'Delta', with_count=True))  # ('Harper', 3)
  • Related