I have a python pandas dataframe, which should be converted to a nested JSON (based on the entry found in the column 'laterality').
My dataframe looks like this:
import pandas as pd
df1 = pd.DataFrame({'id': [1, 1, 1, 2],
'laterality': ['L', 'L', 'R', 'R'],
'image': ['im1', 'im2', 'im3', 'im4'],
'number': [5, 5, 5, 6] })
The desired JSON output then is:
[
{
"id": 1,
"number": 5
"images L": ["im1", "im2"],
"images R": ["im3"]
},
{
"id": 2,
"number": 6,
"images R": ["im4"]
},
...
]
What I tried: I tried to group records based on the .to_dict() method like below. Unfortunately this didn't solve my problem yet. Since I'm currently stuck, I am eager to learn from your advise. I'm looking for a push in the right direction.
json1 = (df1.groupby(['id','number'])
.apply(lambda x: x[['laterality','image']].to_dict('records'))
.reset_index()
.rename(columns={0:'image'})
.to_json(orient='records'))
Some related topics I looked at:
more pythonic way to format a JSON string from a list of tuples
Convert Pandas Dataframe to nested JSON
Convert Pandas DataFrame to JSON format
CodePudding user response:
You can group by ['id', 'number', 'laterality']
and aggregate with list
. After that pivot to get images as columns:
import pandas as pd
df = pd.DataFrame({'id': [1, 1, 1, 2],
'laterality': ['L', 'L', 'R', 'R'],
'image': ['im1', 'im2', 'im3', 'im4'],
'number': [5, 5, 5, 6] })
df_out = df.pivot_table(columns='laterality', index=['id','number'], values='image', aggfunc=list)
df_out.columns = "images " df_out.columns
df_out.reset_index().to_json(orient='records')
Edit: replacing groupby
pivot
by pivot_table
with aggfunc
Output:
[{"id":1,"number":5,"images L":["im1","im2"],"images R":["im3"]},{"id":2,"number":6,"images L":null,"images R":["im4"]}]
CodePudding user response:
It is simple and faster to just iterate the grouped dataframe
out = []
for (index, num), group in df1.groupby(['id', 'number']):
out.append({
"id": index,
"number": num,
"images L": group.loc[group.laterality == 'L', 'image'].tolist(),
"images R": group.loc[group.laterality == 'R', 'image'].tolist()
})
Output
[{'id': 1, 'number': 5, 'images L': ['im1', 'im2'], 'images R': ['im3']},
{'id': 2, 'number': 6, 'images L': [], 'images R': ['im4']}]