For the below DataFrame:
ID | Reg | Departure | Arrival | Date | Time |
---|---|---|---|---|---|
1 | ABC | DUB | LCY | 22/0/22 | 15:23 |
2 | ABC | DUB | LCY | 22/0/22 | 15:27 |
3 | CBA | CPH | HEL | 22/0/22 | 12:21 |
4 | CBA | CPH | HE | 22/0/22 | 12:19 |
I would like to pivot so that it so that the resulting DataFrame is:
ID | Reg | Departure | Arrival | Date | Time | ID2 | Time2 |
---|---|---|---|---|---|---|---|
1 | ABC | DUB | LCY | 22/0/22 | 15:23 | 2 | 15:27 |
3 | CBA | CPH | HEL | 22/0/22 | 12:21 | 4 | 12:19 |
I've been playing around with GroupBy & Pivot but haven't managed to crack it, any help is appreciated!
CodePudding user response:
So this will work if you have just two of each Reg row. But if you have more duplicate rows you will need something more robust.
import data and pandas
import pandas as pd
output
ID Reg Departure Arrival Date Time
0 1 ABC DUB LCY 22/0/22 15:23
1 2 ABC DUB LCY 22/0/22 15:27
2 3 CBA CPH HEL 22/0/22 12:21
3 4 CBA CPH HE 22/0/22 12:19
make two frames with with 'firsts' and one with 'lasts'
final_df = df.drop_duplicates(keep='first', subset='Departure')
id_time_df = df.drop_duplicates(keep='last', subset='Departure')
Then merge the two frames
pd.merge(final_df, id_time_df[['ID', 'Reg', 'Time']], on='Reg')
output
ID_x Reg Departure Arrival Date Time_x ID_y Time_y
0 1 ABC DUB LCY 22/0/22 15:23 2 15:27
1 3 CBA CPH HEL 22/0/22 12:21 4 12:19
CodePudding user response:
import pandas as pd
import io
str = '''ID Reg Departure Arrival Date Time
1 ABC DUB LCY 22/0/22 15:23
2 ABC DUB LCY 22/0/22 15:27
3 CBA CPH HEL 22/0/22 12:21
4 CBA CPH HEL 22/0/22 12:19'''
df = pd.read_csv(io.StringIO(str), sep = '\s ')
df["grpseq"] = df.groupby(["Reg", "Departure", "Arrival", "Date"]).cumcount()
df_new = df.set_index(["Reg", "Departure", "Arrival", "Date", "grpseq"]).unstack(level = -1).reset_index()
df_new.columns = ["Reg", "Departure", "Arrival", "Date", "ID1", "ID2", "Time1", "Time2"]
df_new
Reg Departure Arrival Date ID1 ID2 Time1 Time2
0 ABC DUB LCY 22/0/22 1 2 15:23 15:27
1 CBA CPH HEL 22/0/22 3 4 12:21 12:19
Where ID1
represent the ID
column in your desire table. And a simple reorder of the columns will be done.