I am working on extracting JSON array column in dataframe in Python using Pandas Library, where I have a data like this
>df
id partnerid payments
5263 org1244 [{"sNo": 1, "amount":"1000"}, {"sNo": 2, "amount":"500"}]
5264 org1245 [{"sNo": 1, "amount":"2000"}, {"sNo": 2, "amount":"600"}]
5265 org1246 [{"sNo": 1, "amount":"3000"}, {"sNo": 2, "amount":"700"}]
I want to extract the JSON data inside the list and add it as a column in same dataframe like this
>mod_df
id partnerid sNo amount
5263 org1244 1 1000
5263 org1244 2 500
5264 org1245 1 2000
5264 org1245 2 600
5265 org1246 1 3000
5265 org1246 2 700
I have tried with this approach
import pandas as pd
import json as j
df = pd.read_parquet('sample.parquet')
js_loads = df['payments'].apply(j.loads)
js_list = list(js_loads)
j_data = j.dumps(js_list)
df = df.join(pd.read_json(j_data))
df = df.drop(columns=['payments'] , axis=1)
But this works, only if we have JSON data in column not list of JSON. Can someone explain, how can I achieve my desired output?
CodePudding user response:
Convert it to list
by ast.literal_eval
and use explode()
to transform each element to a row and also replicate the other columns.
Then, use .apply(pd.Series)
to convert dict-like to series
.
Finally, concatenate to original dataframe using pd.concat()
.
Example:
import ast
# sample data
d = {'col1': [0, 1, 2], 'payments': ['[{"sNo": 1, "amount":"1000"}, {"sNo": 2, "amount":"500"}]', '[{"sNo": 1, "amount":"2000"}, {"sNo": 2, "amount":"600"}]', '[{"sNo": 1, "amount":"3000"}, {"sNo": 2, "amount":"700"}]']}
df = pd.DataFrame(data=d, index=[0, 1, 2])
df['payments'] = df['payments'].apply(ast.literal_eval)
df = df.explode('payments')
out = pd.concat([df.drop(['payments'], axis=1), df['payments'].apply(pd.Series)], axis=1).reset_index(drop=True)
output:
col1 sNo amount 0 0 1 1000 1 0 2 500 2 1 1 2000 3 1 2 600 4 2 1 3000 5 2 2 700