I'm analyzing club participation. Getting data as json through url request. This is the json I get and load with json_loads
:
df = [{"club_id":"1234", "sum_totalparticipation":227, "level":1, "idsubdatatable":1229, "segment": "club_id==1234;eventName==national%20participation,eventName==local%20partipation,eventName==global%20participation", "subtable":[{"label":"national participation", "sum_events_totalevents":105,"level":2},{"label":"local participation","sum_events_totalevents":100,"level":2},{"label":"global_participation","sum_events_totalevents":22,"level":2}]}]
when I use json_normalize, this is how df looks:
so, specific participations are aggregated and only sum is available, and I need them flatten, with global/national/local participation in separate rows. Can you help by providing code?
CodePudding user response:
If you want to see the details of the subtable field (which is another list of dictionaries itself), then you can do the following:
...
df = pd.DataFrame(*data)
for i in range(len(df)):
df.loc[i, 'label'] = df.loc[i, 'subtable']['label']
df.loc[i, 'sum_events_totalevents'] = df.loc[i, 'subtable']['sum_events_totalevents']
df.loc[i, 'sublevel'] = int(df.loc[i, 'subtable']['level'])
Note: I purposely renamed the level
field inside the subtable as sublevel
, the reason is there is already a column named level
in the dataframe, and thus avoiding name conflict
CodePudding user response:
The data you show us after your json.load looks quite dirty, some quotes look missing, especially after "segment":"club_id==1234
", and the ;
separator at the beginning does not fit the keys separator inside a dict
.
Nonetheless, let's consider the data you get is supposed to look like this (a list of dictionaries):
import pandas as pd
data = [{"club_id":"1234", "sum_totalparticipation":227, "level":1, "idsubdatatable":1229, "segment": "club_id==1234;eventName==national%20participation,eventName==local%20partipation,eventName==global%20participation", "subtable":[{"label":"national participation", "sum_events_totalevents":105,"level":2},{"label":"local participation","sum_events_totalevents":100,"level":2},{"label":"global_participation","sum_events_totalevents":22,"level":2}]}]
You can see the result with rows separated by unpacking your data inside a DataFrame:
df= pd.DataFrame(*data)
This is the table we get: View of the DataFrame from PyCharm
Hope this helps