Creating a table comparing two dictionaries with lists as values-CodePudding

So I have two dictionaries with a column header as the key and the list of column values as the value and I would like to create a table comparing each of those lists to each other, like percent of list items in common for every combination of lists. I am not even sure how to start.

Dict1={ key1a : list1a, key1b : list1b, ...}

Dict2={ key2a : list2a, key2b : list2b, ...}

I would like the column and row headers to be the key values for each list

 ------- ------------------------------- -------------------------------- ----- 
|  KEYS |             key1a             |             key1b              | ... |
 ------- ------------------------------- -------------------------------- ----- 
| key2a | % list1a and list2a in common | % list1b and list2a in common  | ... |
| key2b | % list1a and list2b in common | % list2b and list 1b in common | ... |
|  ...  |              ...              |              ...               | ... |
 ------- ------------------------------- -------------------------------- -----

CodePudding user response：

It isn't completely clear what you mean by the lists having items "in common". Are they lists of equal length? Do they have unique values?

Below you can see an implementation that creates a pandas DataFrame by iterating though the different dictionary keys and building a new dictionary of lists. In this case, I assume that the lists have unique entries. You can probably adjust that part of the code as necessary for your use case.

import pandas as pd

# Example dictionaries and lists
Dict1 = {'key1a': [1,2,3,4,5], 'key1b':[2,3,4,5,6]}
Dict2 = {'key2a': [1,'a',3,4,5], 'key2b':[2,3,'b',5,6]}

compare_dict = {}
for key1 in Dict1:
    compare_list = []
    for key2 in Dict2:
        count_common = 0
        for item in Dict1.get(key1):
            if item in Dict2.get(key2):
                count_common  = 1
        compare_list.append(count_common/len(Dict1.get(key1)))
    compare_dict[key1] = compare_list

df = pd.DataFrame(compare_dict, index = Dict2.keys())
print(df)

print output:
       key1a  key1b
key2a    0.8    0.6
key2b    0.6    0.8

CodePudding user response：

Not sure of exactly what you expect as output in the table, as that numeric value (percent in common) is relative to the length of the lists which you did not specify if it is the same. This code output the elements in common between the lists

dict1 = {'key1a': [i for i in range(5)], 'key1b': [i for i in range(6)]}
dict2 = {'key2a': [i for i in range(5)], 'key2b': [i for i in range(6)]}


def elements_in_common(list1: list, list2: list) -> list:

    biggest_list = list1 if len(list1) >= len(list2) else list2
    smallest_list = list1 if biggest_list is list1 else list2
    
    elements_in_common = [element for element in biggest_list if element in smallest_list]

    return elements_in_common


table = {}

for dict1_key, dict1_val in dict1.items():
    table[dict1_key] = {}

    for dict2_key, dict2_val in dict2.items():
        table[dict1_key][dict2_key] = elements_in_common(dict1_val, dict2_val)

print(table)

which you can easily convert to a percent relative to the length of the list you wish by dividing the length of those resulting lists over the length of the desired initial list (value of an initial dict). For example, replacing this line

        table[dict1_key][dict2_key] = elements_in_common(dict1_val, dict2_val)

with this line

        table[dict1_key][dict2_key] = len(elements_in_common(dict1_val, dict2_val)) / max(len(dict1_val), len(dict2_val)) * 100

To get the difference of elements in common with respect of the list with higher length.