Home > Enterprise >  split dataframe into a dictionary of dictionaries
split dataframe into a dictionary of dictionaries

Time:10-22

I have a dataframe containing 4 columns. I want to use 2 of the columns as keys for a dictionary of dictionaries, where the values inside are the remaining 2 columns (so a dataframe)

birdies = pd.DataFrame({'Habitat' : ['Captive', 'Wild', 'Captive', 'Wild'],
    'Animal': ['Falcon', 'Falcon','Parrot', 'Parrot'],
                   'Max Speed': [380., 370., 24., 26.],
                  'Color': ["white", "grey", "green", "blue"]})
#this should ouput speed and color
birdies_dict["Falcon"]["Wild"]
#this should contain a dictionary, which the keys are 'Captive','Wild'
birdies_dict["Falcon"]

I have found a way to generate a dictionary of dataframes with a single column as a key, but not with 2 columns:

birdies_dict = {k:table for k,table in birdies.groupby("Animal")}

CodePudding user response:

Pass to_dict to the inside:

birdies_dict = {k:d.to_dict() for k,d in birdies.groupby('Animal')}
birdies_dict['Falcon']['Habitat']

Output:

{0: 'Captive', 1: 'Wild'}

Or do you mean:

out = birdies.set_index(['Animal','Habitat'])
out.loc[('Falcon','Captive')]

which gives:

Max Speed      380
Color        white
Name: (Falcon, Captive), dtype: object

CodePudding user response:

IIUC:

birdies_dict = {k:{habitat: table[['Max Speed', 'Color']].to_numpy() for habitat in table['Habitat'].to_numpy()} for k,table in birdies.groupby("Animal")}

OR

birdies_dict = {k:{habitat: table[['Max Speed', 'Color']] for habitat in table['Habitat'].to_numpy()} for k,table in birdies.groupby("Animal")}
#In this case inner key will have a dataframe

OUTPUT:

Outer_key:  Falcon
inner_key:  Captive
Type:  <class 'numpy.ndarray'>
Data
[[380.0 'white']
 [370.0 'grey']]
--------------------
inner_key:  Wild
Type:  <class 'numpy.ndarray'>
Data
[[380.0 'white']
 [370.0 'grey']]
--------------------
==================================================
Outer_key:  Parrot
inner_key:  Captive
Type:  <class 'numpy.ndarray'>
Data
[[24.0 'green']
 [26.0 'blue']]
--------------------
inner_key:  Wild
Type:  <class 'numpy.ndarray'>
Data
[[24.0 'green']
 [26.0 'blue']]
--------------------
==================================================
  • Related