I would like to convert a pandas dataframe to a multi key dictionary, using 2 ore more columns as the dictionary key, and I would like these keys to be order irrelevant.
Here's an example of converting a pandas dictionary to a regular multi-key dictionary, where order is relevant.
import pandas as pd
import numpy as np
df = pd.DataFrame(np.random.randint(0,100,size=(5, 3)), columns=list('ABC'))
df_dict = df.set_index(['B', 'C']).to_dict()['A']
print(df_dict)
{(33, 21): 85, (61, 46): 88, (78, 12): 48, (89, 18): 65, (91, 19): 41}
so df_dict[(33, 21)]
will get 85
, but df_dict[(21, 33)]
will result in a key error.
Potential Solutions
This is a SO question which covers ways to make order irrelevant dictionaries, using sorted, tuple, Counter, and/or frozenset.
Multiples-keys dictionary where key order doesn't matter
However, no apparent solutions jump out at me for using these datatypes and functions with Pandas conversion methods.
The next idea would be to convert the dictionary keys after the dataframe has been converted.
I tried this
new_d = {frozenset(key): value for key, value in df_dict}
But got this error
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
<ipython-input-49-6a3244440ac2> in <module>()
----> 1 new_d = {frozenset(key): value for key, value in df_dict}
2 new_d
<ipython-input-49-6a3244440ac2> in <dictcomp>(.0)
----> 1 new_d = {frozenset(key): value for key, value in df_dict}
2 new_d
TypeError: 'int' object is not iterable
CodePudding user response:
You're forgetting to loop over df_dict.items()
instead of just df_dict
;)
>>> new_d = {frozenset(key): value for key, value in df_dict.items()}
>>> new_d
{frozenset({10, 99}): 92,
frozenset({60, 76}): 54,
frozenset({6, 20}): 31,
frozenset({36, 46}): 31,
frozenset({3, 68}): 59}
CodePudding user response:
Why not create from df
d = dict(zip(df[['B', 'C']].apply(frozenset,1),df['A']))
d
{frozenset({72, 12}): 34, frozenset({98, 76}): 82, frozenset({67, 7}): 35, frozenset({60, 70}): 18, frozenset({8, 53}): 81}