Home > Back-end >  Extract the JSON data inside the array column in Dataframe in Pandas
Extract the JSON data inside the array column in Dataframe in Pandas

Time:03-04

I am working on extracting JSON array column in dataframe in Python using Pandas Library, where I have a data like this

>df

id      partnerid   payments

5263    org1244     [{"sNo": 1, "amount":"1000"}, {"sNo": 2, "amount":"500"}]
5264    org1245     [{"sNo": 1, "amount":"2000"}, {"sNo": 2, "amount":"600"}]
5265    org1246     [{"sNo": 1, "amount":"3000"}, {"sNo": 2, "amount":"700"}]

I want to extract the JSON data inside the list and add it as a column in same dataframe like this

>mod_df

id      partnerid   sNo amount

5263    org1244     1   1000
5263    org1244     2   500
5264    org1245     1   2000
5264    org1245     2   600
5265    org1246     1   3000
5265    org1246     2   700

I have tried with this approach

import pandas as pd
import json as j

df = pd.read_parquet('sample.parquet')

js_loads = df['payments'].apply(j.loads)
js_list = list(js_loads)
j_data = j.dumps(js_list)
df = df.join(pd.read_json(j_data))
df = df.drop(columns=['payments'] , axis=1)

But this works, only if we have JSON data in column not list of JSON. Can someone explain, how can I achieve my desired output?

CodePudding user response:

Convert it to list by ast.literal_eval and use explode()to transform each element to a row and also replicate the other columns.

Then, use .apply(pd.Series) to convert dict-like to series.

Finally, concatenate to original dataframe using pd.concat().


Example:

import ast

# sample data
d = {'col1': [0, 1, 2], 'payments': ['[{"sNo": 1, "amount":"1000"}, {"sNo": 2, "amount":"500"}]', '[{"sNo": 1, "amount":"2000"}, {"sNo": 2, "amount":"600"}]', '[{"sNo": 1, "amount":"3000"}, {"sNo": 2, "amount":"700"}]']}
df = pd.DataFrame(data=d, index=[0, 1, 2])

df['payments'] = df['payments'].apply(ast.literal_eval)
df = df.explode('payments') 
out = pd.concat([df.drop(['payments'], axis=1), df['payments'].apply(pd.Series)], axis=1).reset_index(drop=True)

output:

  col1 sNo amount
0 0   1   1000
1 0   2   500
2 1   1   2000
3 1   2   600
4 2   1   3000
5 2   2   700
  • Related