Converting json data in dataframe-CodePudding

I'm analyzing club participation. Getting data as json through url request. This is the json I get and load with json_loads:

df = [{"club_id":"1234", "sum_totalparticipation":227, "level":1, "idsubdatatable":1229, "segment": "club_id==1234;eventName==national%20participation,eventName==local%20partipation,eventName==global%20participation", "subtable":[{"label":"national participation", "sum_events_totalevents":105,"level":2},{"label":"local participation","sum_events_totalevents":100,"level":2},{"label":"global_participation","sum_events_totalevents":22,"level":2}]}]

when I use json_normalize, this is how df looks:

so, specific participations are aggregated and only sum is available, and I need them flatten, with global/national/local participation in separate rows. Can you help by providing code?

CodePudding user response：

If you want to see the details of the subtable field (which is another list of dictionaries itself), then you can do the following:

...
    
df = pd.DataFrame(*data)

for i in range(len(df)):
    df.loc[i, 'label'] = df.loc[i, 'subtable']['label']
    df.loc[i, 'sum_events_totalevents'] = df.loc[i, 'subtable']['sum_events_totalevents']
    df.loc[i, 'sublevel'] = int(df.loc[i, 'subtable']['level'])

Note: I purposely renamed the level field inside the subtable as sublevel, the reason is there is already a column named level in the dataframe, and thus avoiding name conflict

CodePudding user response：

The data you show us after your json.load looks quite dirty, some quotes look missing, especially after "segment":"club_id==1234", and the ; separator at the beginning does not fit the keys separator inside a dict.

Nonetheless, let's consider the data you get is supposed to look like this (a list of dictionaries):

import pandas as pd

data = [{"club_id":"1234", "sum_totalparticipation":227, "level":1, "idsubdatatable":1229, "segment": "club_id==1234;eventName==national%20participation,eventName==local%20partipation,eventName==global%20participation", "subtable":[{"label":"national participation", "sum_events_totalevents":105,"level":2},{"label":"local participation","sum_events_totalevents":100,"level":2},{"label":"global_participation","sum_events_totalevents":22,"level":2}]}]

You can see the result with rows separated by unpacking your data inside a DataFrame:

df= pd.DataFrame(*data)

This is the table we get: View of the DataFrame from PyCharm

Hope this helps