Home > Software design >  Finding the difference in value counts by keys in two Dictionaries
Finding the difference in value counts by keys in two Dictionaries

Time:02-12

I have two sample python dictionaries that counts how many times each key appears in a DataFrame.

dict1 =  {
           2000 : 2,
           3000 : 3,
           4000 : 4,
           5000 : 6,
           6000 : 8
          }

dict2 = {
           4000 : 4,
           3000 : 3,
           2000 : 4,
           6000 : 10,
           5000 : 4
          }

I would like to output the following where there is a difference.

diff = {
        2000 : 2
        5000 : 2
        6000 : 2
       }

I would appreciate any help as I am not familiar with iterating though dictionaries. Even if the output shows me at which key there is a difference in values, it would work for me. I did the following but it does not produce any output.

for (k,v), (k2,v2) in zip(dict1.items(), dict2.items()):
    if k == k2:
        if v == v2:
            pass
        else:
            print('value is different at k')

CodePudding user response:

The way you're doing doesn't work because the dicts are not sorted, so k==k2 is always evaluated False.

You could use a dict comprehension where you traverse dict1 and subtract the value in dict2 with the matching key:

diff = {k: abs(v - dict2[k]) for k, v in dict1.items()}

Output:

{2000: 2, 3000: 0, 4000: 0, 5000: 2, 6000: 2}

If you have Python >=3.8, and you want only key-value pairs where value > 0, then you could also use the walrus operator:

diff = {k: di for k, v in dict1.items() if (di := abs(v - dict2[k])) > 0}

Output:

{2000: 2, 5000: 2, 6000: 2}

Since you tagged it as pandas, you can also do a similar job in pandas as well.

First, we need to convert the dicts to DataFrame objects, then join them. Since join joins by index by default and the indexes are the keys in the dicts, you get a nice DataFrame where you can directly find the difference row-wise. Then use the diff method on axis abs to find the differences.

df1 = pd.DataFrame.from_dict(dict1, orient='index')
df2 = pd.DataFrame.from_dict(dict2, orient='index')
out = df1.join(df2, lsuffix='_x', rsuffix='').diff(axis=1).abs().dropna(axis=1)['0']

Output:

2000    2
3000    0
4000    0
5000    2
6000    2
Name: 0, dtype: int64
  • Related