I have data frame (df)
as shown below
Input
Df
ID linkedShipments
12 [{'SID': 'GBDXY551', 'Code': 'GBDXY55', 'Num': '2021121'}, {'SID': 'GBDXY551', 'Code': 'GBDXY55', 'Num': '20211215'}]
32 [{'SID': 'CHSGI422', 'Code': 'CHSGI421', 'Num': '4024421'}, {'SID': 'GBDXY551', 'Code': 'GBDXY55', 'Num': '20211222'}]
Expected Output
ID SID Code Num
12 GBDXY551 GBDXY55 2021121
12 GBDXY551 GBDXY55 20211215
32 CHSGI422 CHSGI421 4024421
32 GBDXY551 GBDXY55 20211222
How can this be done in pandas?
CodePudding user response:
The easiest way is:
- json.loads each row of linkedShipments
- traversal all element in linkedShipments
- combine ID and each element of linkedShipments as row dict
- append to new data
And the python code is below
import pandas as pd
import json
#The simulation of data
data = {"ID": [12, 24],
"linkedShipments": ["[{'SID': 'GBDXY551', 'Code': 'GBDXY55', 'Num': '2021121'}, {'SID': 'GBDXY551', 'Code': 'GBDXY55', 'Num': '20211215'}]",
"[{'SID': 'CHSGI422', 'Code': 'CHSGI421', 'Num': '4024421'}, {'SID': 'GBDXY551', 'Code': 'GBDXY55', 'Num': '20211222'}]"]
}
#load as dataframe
df = pd.DataFrame(data)
new_data = {} #define new data
#treverse all rows in current data
for index, row in df.iterrows():
#json only accept double quotes, so convert singal quotes to double quotes
shipment_dict_list = json.loads(row['linkedShipments'].replace("\'", "\""))
for shipment_dict in shipment_dict_list:
new_data.setdefault("ID",[]).append(row['ID'])
for key in shipment_dict:
new_data.setdefault(key,[]).append(shipment_dict[key])
print(pd.DataFrame(new_data))
Here is the result:
ID SID Code Num
0 12 GBDXY551 GBDXY55 2021121
1 12 GBDXY551 GBDXY55 20211215
2 24 CHSGI422 CHSGI421 4024421
3 24 GBDXY551 GBDXY55 20211222