Home > Back-end >  How to convert nested comma separated column inside a pandas data frame to specific format in Python
How to convert nested comma separated column inside a pandas data frame to specific format in Python

Time:12-16

I have data frame (df) as shown below

Input

Df

ID  linkedShipments
12      [{'SID': 'GBDXY551', 'Code': 'GBDXY55', 'Num': '2021121'}, {'SID': 'GBDXY551', 'Code': 'GBDXY55', 'Num': '20211215'}]
32      [{'SID': 'CHSGI422', 'Code': 'CHSGI421', 'Num': '4024421'}, {'SID': 'GBDXY551', 'Code': 'GBDXY55', 'Num': '20211222'}]

Expected Output

ID  SID         Code     Num
12  GBDXY551    GBDXY55  2021121
12  GBDXY551    GBDXY55  20211215
32  CHSGI422    CHSGI421 4024421
32  GBDXY551    GBDXY55  20211222

How can this be done in pandas?

CodePudding user response:

The easiest way is:

  1. json.loads each row of linkedShipments
  2. traversal all element in linkedShipments
  3. combine ID and each element of linkedShipments as row dict
  4. append to new data

And the python code is below

import pandas as pd
import json

#The simulation of data
data = {"ID": [12, 24],
    "linkedShipments": ["[{'SID': 'GBDXY551', 'Code': 'GBDXY55', 'Num': '2021121'}, {'SID': 'GBDXY551', 'Code': 'GBDXY55', 'Num': '20211215'}]", 
    "[{'SID': 'CHSGI422', 'Code': 'CHSGI421', 'Num': '4024421'}, {'SID': 'GBDXY551', 'Code': 'GBDXY55', 'Num': '20211222'}]"]
   }
#load as dataframe
df = pd.DataFrame(data)

new_data = {} #define new data
#treverse all rows in current data
for index, row in df.iterrows():
    #json only accept double quotes, so convert singal quotes to double quotes
    shipment_dict_list = json.loads(row['linkedShipments'].replace("\'", "\"")) 
    for shipment_dict in shipment_dict_list:
        new_data.setdefault("ID",[]).append(row['ID'])
        for key in shipment_dict:
            new_data.setdefault(key,[]).append(shipment_dict[key])
print(pd.DataFrame(new_data))

Here is the result:

   ID       SID      Code       Num
0  12  GBDXY551   GBDXY55   2021121
1  12  GBDXY551   GBDXY55  20211215
2  24  CHSGI422  CHSGI421   4024421
3  24  GBDXY551   GBDXY55  20211222
  • Related