Home > Mobile >  Convert pandas dictionary to a multi key dictionary where key order is irrelevant
Convert pandas dictionary to a multi key dictionary where key order is irrelevant

Time:12-25

I would like to convert a pandas dataframe to a multi key dictionary, using 2 ore more columns as the dictionary key, and I would like these keys to be order irrelevant.

Here's an example of converting a pandas dictionary to a regular multi-key dictionary, where order is relevant.

import pandas as pd
import numpy as np
df = pd.DataFrame(np.random.randint(0,100,size=(5, 3)), columns=list('ABC'))

df_dict = df.set_index(['B', 'C']).to_dict()['A']
print(df_dict)
{(33, 21): 85, (61, 46): 88, (78, 12): 48, (89, 18): 65, (91, 19): 41}

so df_dict[(33, 21)] will get 85, but df_dict[(21, 33)] will result in a key error.

Potential Solutions

This is a SO question which covers ways to make order irrelevant dictionaries, using sorted, tuple, Counter, and/or frozenset.

Multiples-keys dictionary where key order doesn't matter

However, no apparent solutions jump out at me for using these datatypes and functions with Pandas conversion methods.

The next idea would be to convert the dictionary keys after the dataframe has been converted.

I tried this

new_d = {frozenset(key): value for key, value in df_dict}

But got this error

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-49-6a3244440ac2> in <module>()
----> 1 new_d = {frozenset(key): value for key, value in df_dict}
      2 new_d

<ipython-input-49-6a3244440ac2> in <dictcomp>(.0)
----> 1 new_d = {frozenset(key): value for key, value in df_dict}
      2 new_d

TypeError: 'int' object is not iterable

CodePudding user response:

You're forgetting to loop over df_dict.items() instead of just df_dict ;)

>>> new_d = {frozenset(key): value for key, value in df_dict.items()}
>>> new_d
{frozenset({10, 99}): 92,
 frozenset({60, 76}): 54,
 frozenset({6, 20}): 31,
 frozenset({36, 46}): 31,
 frozenset({3, 68}): 59}

CodePudding user response:

Why not create from df

d = dict(zip(df[['B', 'C']].apply(frozenset,1),df['A']))
d
{frozenset({72, 12}): 34, frozenset({98, 76}): 82, frozenset({67, 7}): 35, frozenset({60, 70}): 18, frozenset({8, 53}): 81}
  • Related