Dataframe to nested JSON (grouped by row entry)-CodePudding

I have a python pandas dataframe, which should be converted to a nested JSON (based on the entry found in the column 'laterality').

My dataframe looks like this:

import pandas as pd

df1 = pd.DataFrame({'id': [1, 1, 1, 2],
                   'laterality': ['L', 'L', 'R', 'R'],
                   'image': ['im1', 'im2', 'im3', 'im4'],
                   'number': [5, 5, 5, 6] })

The desired JSON output then is:

[
  {
    "id": 1,
    "number": 5
    "images L": ["im1", "im2"],
    "images R": ["im3"]    
  },
  {
    "id": 2,
    "number": 6,
    "images R": ["im4"]    
  },
...
]

What I tried: I tried to group records based on the .to_dict() method like below. Unfortunately this didn't solve my problem yet. Since I'm currently stuck, I am eager to learn from your advise. I'm looking for a push in the right direction.

json1 = (df1.groupby(['id','number'])
       .apply(lambda x: x[['laterality','image']].to_dict('records'))
       .reset_index()
       .rename(columns={0:'image'})
       .to_json(orient='records'))

Some related topics I looked at:

more pythonic way to format a JSON string from a list of tuples

Convert Pandas Dataframe to nested JSON

Convert Pandas DataFrame to JSON format

CodePudding user response：

You can group by ['id', 'number', 'laterality'] and aggregate with list. After that pivot to get images as columns:

import pandas as pd

df = pd.DataFrame({'id': [1, 1, 1, 2],
                   'laterality': ['L', 'L', 'R', 'R'],
                   'image': ['im1', 'im2', 'im3', 'im4'],
                   'number': [5, 5, 5, 6] })

df_out = df.pivot_table(columns='laterality', index=['id','number'], values='image', aggfunc=list)

df_out.columns = "images "   df_out.columns

df_out.reset_index().to_json(orient='records')

Edit: replacing groupby pivot by pivot_table with aggfunc

Output:

[{"id":1,"number":5,"images L":["im1","im2"],"images R":["im3"]},{"id":2,"number":6,"images L":null,"images R":["im4"]}]

CodePudding user response：

It is simple and faster to just iterate the grouped dataframe

out = []
for (index, num), group in df1.groupby(['id', 'number']):
    out.append({
        "id": index,
        "number": num,
        "images L": group.loc[group.laterality == 'L', 'image'].tolist(),
        "images R": group.loc[group.laterality == 'R', 'image'].tolist()
    })

Output

[{'id': 1, 'number': 5, 'images L': ['im1', 'im2'], 'images R': ['im3']},
 {'id': 2, 'number': 6, 'images L': [], 'images R': ['im4']}]