Find the pair of dictionary keys that have the most common values-CodePudding

I need to write a function that finds a couple of people with the most common hobbies, that is this couple should have the highest ratio of common hobbies to different hobbies. If multiple pairs have the same best ratio, it doesn't matter which pair is returned and the only exception is when multiple pairs share all of their hobbies, in which case the pair with the most shared hobbies is returned.

def find_two_people_with_most_common_hobbies(data: str) -> tuple:
    new_dict = create_dictionary(data) # creates a dictionary in the form {name1: [hobby1, hobby2, ...], name2: [...]}
    value_list = [] # list that stores all hobbies, duplicates included
    for value in new_dict.items():
        for ele in value[1]:
            value_list.append(ele)
    filtered_list = set([x for x in value_list if value_list.count(x) > 1]) # list where hobbies appear more than once, no duplicates
    return tuple([k for k, v in new_dict.items() if set(v).intersection(filtered_list)])

So, given the input "John:running\nJohn:walking\nMary:dancing\nMary:running\nNora:running\nNora:singing\nNora:dancing", the output should be ('Mary', 'Nora'). My code returns ('John', 'Mary', 'Nora'), because it looks for an intersection between the values in the dictionary and what is in the filtered list. I don't understand how to make it return only shared hobbies.

CodePudding user response：

I would do this as follows:

turn the dictionary values into sets
get all the combinations of people
compute the intersection (common hobbies) and symmetric difference (different hobbies) of each pair of people
find the max based on dividing common by different hobbies, using number of common hobbies to split equal values. Note that since the two may have no different hobbies, we use a function to compute the ordering value so that we can catch a ZeroDivisionError.

import itertools

dd = {'John': ['running', 'walking'], 'Mary': ['dancing', 'running'], 'Nora': ['running', 'singing', 'dancing' ]}

ss = { k : set(v) for k, v in dd.items() }
# {'John': {'walking', 'running'}, 'Mary': {'dancing', 'running'}, 'Nora': {'singing', 'running', 'dancing'}}

pp = [t for t in itertools.combinations(dd.keys(), 2)]
# [('John', 'Mary'), ('John', 'Nora'), ('Mary', 'Nora')]

hh = { (p1, p2) : (len(ss[p1] & ss[p2]), len(ss[p1] ^ ss[p2])) for p1, p2 in pp }
# {('John', 'Mary'): (1, 2), ('John', 'Nora'): (1, 3), ('Mary', 'Nora'): (2, 1)}

def most_shared(key):
    try:
        ratio = hh[key][0] / hh[key][1]
    except ZeroDivisionError:
        ratio = float('inf')
    return (ratio, hh[key][0])

res = max(hh, key=most_shared)
# ('Mary', 'Nora')

CodePudding user response：

s = "John:running\nJohn:walking\nMary:dancing\nMary:running\nNora:running\nNora:singing\nNora:dancing"
d={}
for v in s.split('\n'):
     k,v=v.split(':')
     if k in d:
          d[k].append(v)
     else:
          d[k]=[v]

for k1,v1 in d.items():
     for k2,v2 in d.items():
          if k1!=k2:
               for v in v1:
                    if v not in v2:
                         break
               else:
                    print(k1,k2)