Home > Blockchain >  Converting json data in dataframe
Converting json data in dataframe

Time:11-17

I'm analyzing club participation. Getting data as json through url request. This is the json I get and load with json_loads:

df = [{"club_id":"1234", "sum_totalparticipation":227, "level":1, "idsubdatatable":1229, "segment": "club_id==1234;eventName==national%20participation,eventName==local%20partipation,eventName==global%20participation", "subtable":[{"label":"national participation", "sum_events_totalevents":105,"level":2},{"label":"local participation","sum_events_totalevents":100,"level":2},{"label":"global_participation","sum_events_totalevents":22,"level":2}]}]

when I use json_normalize, this is how df looks:

normalized data frame

so, specific participations are aggregated and only sum is available, and I need them flatten, with global/national/local participation in separate rows. Can you help by providing code?

CodePudding user response:

If you want to see the details of the subtable field (which is another list of dictionaries itself), then you can do the following:

...
    
df = pd.DataFrame(*data)

for i in range(len(df)):
    df.loc[i, 'label'] = df.loc[i, 'subtable']['label']
    df.loc[i, 'sum_events_totalevents'] = df.loc[i, 'subtable']['sum_events_totalevents']
    df.loc[i, 'sublevel'] = int(df.loc[i, 'subtable']['level'])

Note: I purposely renamed the level field inside the subtable as sublevel, the reason is there is already a column named level in the dataframe, and thus avoiding name conflict

CodePudding user response:

The data you show us after your json.load looks quite dirty, some quotes look missing, especially after "segment":"club_id==1234", and the ; separator at the beginning does not fit the keys separator inside a dict.

Nonetheless, let's consider the data you get is supposed to look like this (a list of dictionaries):

import pandas as pd

data = [{"club_id":"1234", "sum_totalparticipation":227, "level":1, "idsubdatatable":1229, "segment": "club_id==1234;eventName==national%20participation,eventName==local%20partipation,eventName==global%20participation", "subtable":[{"label":"national participation", "sum_events_totalevents":105,"level":2},{"label":"local participation","sum_events_totalevents":100,"level":2},{"label":"global_participation","sum_events_totalevents":22,"level":2}]}]

You can see the result with rows separated by unpacking your data inside a DataFrame:

df= pd.DataFrame(*data)

This is the table we get: View of the DataFrame from PyCharm

Hope this helps

  • Related