Home > database >  Nested Json in to dataframe (pandas)
Nested Json in to dataframe (pandas)

Time:04-27

I`m importing the following json. It is not super extensive but I just added a small piece here for demonstration:

{'request': {'Target': 'Offer',
  'Format': 'json',
  'Service': 'XXXXX',
  'Version': '2',
  'NetworkToken': 'uu238939871',
  'Method': 'findAll',
  'fields': ['preview_url',
   'sub1',
   'sub2',
   'sub3',
   'sub4',
   'sub5',
   'sub6',
   'sub7',
   'sub8',
   'sub9',
   'sub10',
   'sub11',
   'sub12'],
  'filters': {'status': 'active', 'advertiser_id': '626'}},
 'response': {'status': 1,
  'httpStatus': 200,
  'data': {'231': {'Offer': {'sub1': '231',
     'sub2': '626',
     'sub3': 'XXXXXXXX',
     'sub4': 'XXXXXXX',
     'sub5': 'https://play.google.com/store/apps',
     'sub6': 'https://XXXXX',
     'sub7': '0.39',
     'sub8': 'active',
     'sub9': 'None',
     'sub10': '0',
     'sub11': '0',
     'sub12': '1',
     'sub13': 'XXXXX'}},
   '216': {'Offer': {'sub1': '216',
     'sub2': '626',
     'sub3': 'XXXXXXXX',
     'sub4': '682666',
     'sub5': 'https://play.google.com/store/appsg',
     'sub6': 'https://XXXXXXX',
     'sub7': '1.96000',
     'sub8': 'active',
     'sub9': 'None',
     'sub10': '0',
     'sub11': '0',
     'sub12': '1',
     'sub13': 'XXXXX'}}
 'errors': [],
  'errorMessage': None}}

Im trying to get this into a data frame by using df = pd.json_normalize(data) however the only thing Im getting is a df with 1 row and more than 400 columns, which is not ideal. I also tried to use flatten but got an error where it just says 'Data'

from flatten_json import flatten
dic_flattened = (flatten(d, '.') for d in data['data'])
df = pd.DataFrame(dic_flattened)

The final data frame should have each sub in a column and the fields in rows, for example:

Sub1 Sub2 Sub3
231 626 XXXXXXXX
216 626 XXXXXXXX

CodePudding user response:

It's simpler if you reformat the dictionary in Python first before constructing the DataFrame. Since you want the Sub keys, you can simply extract them using a list comprehension:

out = pd.DataFrame([v['Offer'] for v in my_data['response']['data'].values() if isinstance(v, dict)])

Output:

  sub1 sub2      sub3     sub4                                 sub5             sub6     sub7    sub8  sub9 sub10 sub11 sub12  sub13
0  231  626  XXXXXXXX  XXXXXXX   https://play.google.com/store/apps    https://XXXXX     0.39  active  None     0     0     1  XXXXX
1  216  626  XXXXXXXX   682666  https://play.google.com/store/appsg  https://XXXXXXX  1.96000  active  None     0     0     1  XXXXX
  • Related