Home > Enterprise >  JSON not splitted in different cells
JSON not splitted in different cells

Time:03-29

I have a problem. I have a json that contain logs. Each logs could have a change but does not have to. So I want to show the logs with the changes in a pandas. But unfortunately the changes is not splitted in different cells (e.g. id and operation). So the complete changes is one cell. How could I get the following output like below?

{'model': 
     {'logs': 
          [{'id': '001',
            'changes': 
                      [{'id': '101','operation': 'UPDATE'}]},
                        {'id': '0000000017b068fc',
             'changes': 
                      [{'id': '201','operation': 'UPDATE'}]},
           {'id': '002',
              'changes': 
                      [{'id': '102','operation': 'UPDATE'},
                       {'id': '202','operation': 'UPDATE'},
                       {'id': '302','operation': 'UPDATE'}]},
          {'id': '003', 
               'changes': 
                      []},
          {'id': '004', 
                'changes': 
                       []},

Code

import pandas as pd
df=pd.json_normalize(result['model']['logs'])

What I got

[OUT]
id  changes
0   001 [{'id': '101'...
1   002 [{'id': '102'...
3   003 []

What I want


[OUT]

     id     changes.id    changes.operation
0    001    101           UPDATE
1    001    201           UPDATE
2    002    102           UPDATE
3    002    202           UPDATE
4    002    302           UPDATE
5    003    None          None
6    004    None          None

CodePudding user response:

Try:

out = pd.json_normalize(result['model']['logs'], 'changes', 'id', 
                        record_prefix='changes.', errors='ignore')
print(out)

# Output
  changes.id changes.operation                id
0        101            UPDATE               001
1        201            UPDATE  0000000017b068fc
2        102            UPDATE               002
3        202            UPDATE               002
4        302            UPDATE               002

Update

If you need to keep NaN rows, use:

out = pd.json_normalize(result['model']['logs']).explode('changes')
changes = pd.DataFrame(out.pop('changes').to_dict()).T.add_prefix('changes.')
out = pd.concat([out, changes], axis=1)
print(out)

# Output
                 id changes.id changes.operation
0               001        101            UPDATE
1  0000000017b068fc        201            UPDATE
2               002        302            UPDATE
2               002        302            UPDATE
2               002        302            UPDATE
3               003        NaN               NaN
4               004        NaN               NaN
  • Related