I have an dataset which contains JSON dictionary as a column. I want to parse this column to the new columns based on their keys. The column itself is an object and df.iloc gives me a string, so I couldn't figure out how to handle it. I tried json_normalize and tolist but apparently they were wrong.
Unnamed: 0 _id userInputs sessionID
222 222 5bc915caf9af8b0dad3c0660 [{'userID': 22, 'milesRequested': 170, 'WhPerM... 2_39_88_24_2018-04-30 15:07:48.608581
and userInputs:
c.iloc[0]['userInputs']
"[{'userID': 22, 'milesRequested': 170, 'WhPerMile': 350, 'minutesAvailable': 550, 'modifiedAt': 'Mon, 30 Apr 2018 15:08:54 GMT', 'paymentRequired': True, 'requestedDeparture': 'Tue, 01 May 2018 00:17:49 GMT', 'kWhRequested': 59.5}]"
So userID, milesRequested etc. will be added as a new column corresponding to their values for all dataset.
CodePudding user response:
First, to convert the string to python object apply ast.literal_eval
to the column, then convert the list of dict to dataframe columns:
from ast import literal_eval
df["userInputs"] = df["userInputs"].apply(literal_eval)
df = df.explode("userInputs")
df = pd.concat([df, df.pop("userInputs").apply(pd.Series)], axis=1)
print(df)
Prints:
_id userID milesRequested WhPerMile minutesAvailable modifiedAt paymentRequired requestedDeparture kWhRequested
0 xxx 22 170 350 550 Mon, 30 Apr 2018 15:08:54 GMT True Tue, 01 May 2018 00:17:49 GMT 59.5
DataFrame used:
_id userInputs
0 xxx [{'userID': 22, 'milesRequested': 170, 'WhPerMile': 350, 'minutesAvailable': 550, 'modifiedAt': 'Mon, 30 Apr 2018 15:08:54 GMT', 'paymentRequired': True, 'requestedDeparture': 'Tue, 01 May 2018 00:17:49 GMT', 'kWhRequested': 59.5}]