Home > Software design >  Convert a pandas dataframe into a nested dictionary
Convert a pandas dataframe into a nested dictionary

Time:10-08

I have a dataframe that I would like to convert into a nested dictionary. For example:

df =

ID Action Responsible Phase
1.1 Request Document Project Manager 1.0 Create Document Request
2.1 Create course module Writer 2.0 Create Document
2.2 Send module for review Writer 2.0 Create Document
3.1 Publish Course Reviewers 3.0 Publish Document
3.2 Address feedback Writer 3.0 Publish Document

Ultimately, I need to turn it into a nested dictionary that is something like this:

context = {'Section': 

[{'Phase': '1.0 Create Document',
   'Activity': [
            {'Responsible': 'Project Manager', 'ID': '1.1', 'Action': 'Request Document'},
            ],
        }, 
 {'Phase': '2.0 Create Document',
  'Activity': [
            {'Responsible': 'Writer', 'ID': '2.1', 'Action': 'Create course module'},
            {'Responsible': 'Writer', 'ID': '2.2', 'Action': 'Send module for review'},    
        ],
        },
{'Phase': '3.0 Publish Document',
  'Activity': [
            {'Responsible': 'Reviewers', 'ID': '3.1', 'Action': 'Publish course'},
            {'Responsible': 'Writer', 'ID': '3.2', 'Action': 'Address Feedback'},    
        ],
        }    
],
} 

I've thought of using df.groupby and to_dict and a lambda function, but I haven't figured out how to get it to work

(Sorry, I know this isn't the cleanest code or example; I'm still learning)

EDIT:

The code I have tried is:

context = df.groupby('Phase')[['ID','Action','Responsible','Note','Output']].apply(lambda x: x.set_index('ID').to_dict(orient='index')).to_dict()

but that provides the wrong output as it doesn't give the right keys for the dictionary. And as I think about it, what I really need to do is create nested lists inside a dictionary, matched to the right key, grouped by the 'Phase'

CodePudding user response:

You can use to_dict within groupby, then use to_dict on the result again to get nested records:

data = (df.drop('Phase', axis=1) 
          .groupby(df['Phase'])
          .apply(lambda x: x.to_dict(orient='r'))
          .reset_index(name='Activity')
          .to_dict(orient='r'))

context = {'Section': data}
print(context)
{'Section': [{'Activity': [{'Action': 'Request Document',
                            'ID': 1.1,
                            'Responsible': 'Project Manager'}],
              'Phase': '1.0 Create Document Request'},
             {'Activity': [{'Action': 'Create course module',
                            'ID': 2.1,
                            'Responsible': 'Writer'},
                           {'Action': 'Send module for review',
                            'ID': 2.2,
                            'Responsible': 'Writer'}],
              'Phase': '2.0 Create Document'},
             {'Activity': [{'Action': 'Publish Course',
                            'ID': 3.1,
                            'Responsible': 'Reviewers'},
                           {'Action': 'Address feedback',
                            'ID': 3.2,
                            'Responsible': 'Writer'}],
              'Phase': '3.0 Publish Document'}]}
  • Related