I have a dataframe containing 4 columns. I want to use 2 of the columns as keys for a dictionary of dictionaries, where the values inside are the remaining 2 columns (so a dataframe)
birdies = pd.DataFrame({'Habitat' : ['Captive', 'Wild', 'Captive', 'Wild'],
'Animal': ['Falcon', 'Falcon','Parrot', 'Parrot'],
'Max Speed': [380., 370., 24., 26.],
'Color': ["white", "grey", "green", "blue"]})
#this should ouput speed and color
birdies_dict["Falcon"]["Wild"]
#this should contain a dictionary, which the keys are 'Captive','Wild'
birdies_dict["Falcon"]
I have found a way to generate a dictionary of dataframes with a single column as a key, but not with 2 columns:
birdies_dict = {k:table for k,table in birdies.groupby("Animal")}
CodePudding user response:
Pass to_dict
to the inside:
birdies_dict = {k:d.to_dict() for k,d in birdies.groupby('Animal')}
birdies_dict['Falcon']['Habitat']
Output:
{0: 'Captive', 1: 'Wild'}
Or do you mean:
out = birdies.set_index(['Animal','Habitat'])
out.loc[('Falcon','Captive')]
which gives:
Max Speed 380
Color white
Name: (Falcon, Captive), dtype: object
CodePudding user response:
IIUC:
birdies_dict = {k:{habitat: table[['Max Speed', 'Color']].to_numpy() for habitat in table['Habitat'].to_numpy()} for k,table in birdies.groupby("Animal")}
OR
birdies_dict = {k:{habitat: table[['Max Speed', 'Color']] for habitat in table['Habitat'].to_numpy()} for k,table in birdies.groupby("Animal")}
#In this case inner key will have a dataframe
OUTPUT:
Outer_key: Falcon
inner_key: Captive
Type: <class 'numpy.ndarray'>
Data
[[380.0 'white']
[370.0 'grey']]
--------------------
inner_key: Wild
Type: <class 'numpy.ndarray'>
Data
[[380.0 'white']
[370.0 'grey']]
--------------------
==================================================
Outer_key: Parrot
inner_key: Captive
Type: <class 'numpy.ndarray'>
Data
[[24.0 'green']
[26.0 'blue']]
--------------------
inner_key: Wild
Type: <class 'numpy.ndarray'>
Data
[[24.0 'green']
[26.0 'blue']]
--------------------
==================================================