Home > Software design >  Create nested dictionary based on column values from Pandas dataframe?
Create nested dictionary based on column values from Pandas dataframe?

Time:06-09

I have a dataframe

df1 = pd.DataFrame(data={'col1': [21, 44, 28, 32, 20, 39, 42], 
                         'col2': ['<1', '>2', '>2', '>3', '<1', '>2', '>4'], 
                         'col3': ['yes', 'yes', 'no', 'no', 'yes', 'no', 'yes'], 
                         'col4': [1, 1, 0, 0, 1, 0, 1], 
                         'Group': [0, 2, 1, 1, 0, 1, 2] })
   col1 col2 col3  col4  Group
0    21   <1  yes     1      0
1    44   >2  yes     1      2
2    28   >2   no     0      1
3    32   >3   no     0      1
4    20   <1  yes     1      0
5    39   >2   no     0      1
6    42   >4  yes     1      2

Based on 'Group' column values 0,1,2 I need to create a nested dictionary like

{0: {'col1': {'mean': 20.5, 'sd': 0.5},
  'col2': ['<1'],
  'col3': ['yes'],
  'col4': {'mean': 1, 'sd': 0}},
 1: {'col1': {'mean': 33, 'sd': 4.54},
  'col2': ['>2', '>3'],
  'col3': ['no'],
  'col4': {'mean': 0, 'sd': 0}},
 2: {'col1': {'mean': 43, 'sd': 1},
  'col2': ['>2', '>4'],
  'col3': ['yes'],
  'col4': {'mean': 1, 'sd': 0}}}

col1 and col4 are numeric, col2 and col3 are string

For Group 0:

'col1' : mean and stddev of values 20,21

'col2' : which has only '<1'

'col3' : has only 'yes'

'col4' : mean and stddev of values 1,1

For Group 1:

'col1' : mean and stddev of values 28,32,39

'col2' : which has '>2' and '>3'

'col3' : has only 'no'

'col4' : mean and stddev of values 0,0,0

For Group 2:

'col1' : mean and stddev of values 42,44

'col2' : which has '>2' and '>4'

'col3' : has only 'yes'

'col4' : mean and stddev of values 1,1

CodePudding user response:

df1.groupby('Group').agg(lambda x: x.unique().tolist() if x.dtype =='object' 
                         else {'mean':x.mean(), 'std':x.std(ddof=0)}).T.to_dict('index')

Out[396]: 
{0: {'col1': {'mean': 20.5, 'std': 0.5},
  'col2': ['<1'],
  'col3': ['yes'],
  'col4': {'mean': 1.0, 'std': 0.0}},
 1: {'col1': {'mean': 33.0, 'std': 4.546060565661952},
  'col2': ['>2', '>3'],
  'col3': ['no'],
  'col4': {'mean': 0.0, 'std': 0.0}},
 2: {'col1': {'mean': 43.0, 'std': 1.0},
  'col2': ['>2', '>4'],
  'col3': ['yes'],
  'col4': {'mean': 1.0, 'std': 0.0}}}
  • Related