So I have a very big problem I would like help with. I have a dataframe that looks like this:
Route | Day | Time | Tickets Sold | Revenue |
---|---|---|---|---|
SD - LA | 04-01-22 | 8:00 | 45 | 210 |
SD - LA | 04-05-22 | 8:00 | 89 | 293 |
SD - LA | 04-06-22 | 8:00 | 60 | 252 |
SD - LA | 04-07-22 | 8:00 | 91 | 310 |
SD - LA | 04-09-22 | 8:00 | 129 | 470 |
SD - SF | 04-01-22 | 10:00 | 43 | 200 |
SD - SF | 04-05-22 | 10:00 | 52 | 240 |
SD - SF | 04-06-22 | 10:00 | 71 | 320 |
SD - SF | 04-07-22 | 10:00 | 67 | 300 |
SD - SF | 04-09-22 | 10:00 | 70 | 310 |
I would like to transform it to something like this:
Route | Time | TS 04-01-22 | R 04-01-22 | TS 04-05-22 | R 04-05-22 | TS 04-06-22 | R 04-06-22 |
---|---|---|---|---|---|---|---|
SD-LA | 8:00 | 45 | 210 | 89 | 293 | 60 | 252 |
SD-SF | 10:00 | 43 | 200 | 52 | 240 | 71 | 320 |
Where R = Revenue and TS = Tickets Sold
I would like to include all the dates!
I know something like a reshape would work.. but honestly I don't know how to do it, because I want my columns to have different names
Thanks in advance, all help is appreciated!
I want to do this basically because I want a time series that shows me how the tickets sold and the revenue have evolved during time!
CodePudding user response:
You want to pivot
the df:
header = ['Route', 'Day', 'Time', 'Tickets Sold', 'Revenue']
rows = [
['SD - LA', '04-01-22', '8:00', '45', '210'],
['SD - LA', '04-05-22', '8:00', '89', '293'],
['SD - LA', '04-06-22', '8:00', '60', '252'],
['SD - LA', '04-07-22', '8:00', '91', '310'],
['SD - LA', '04-09-22', '8:00', '129', '470'],
['SD - SF', '04-01-22', '10:00', '43', '200'],
['SD - SF', '04-05-22', '10:00', '52', '240'],
['SD - SF', '04-06-22', '10:00', '71', '320'],
['SD - SF', '04-07-22', '10:00', '67', '300'],
['SD - SF', '04-09-22', '10:00', '70', '310']
]
# store your data as df
df = pd.DataFrame(
rows,
columns=header
)
# PIVOT!!!
df = df.pivot(
values = ['Tickets Sold', 'Revenue'],
index=['Route', 'Time'],
columns = 'Day'
).reset_index()
# rename columns
df.columns = [' '.join(c) for c in df.columns]
df
CodePudding user response:
Here a way to get the columns ordered correctly (since the order of your dates in format %m-%d-%y
might be messed up at year change):
df.rename({"Revenue": "R", "Tickets Sold": "TS"}, axis=1, inplace=True)
df = df.pivot_table(index=["Route", "Time"], columns="Day", values=["TS", "R"]).reset_index()
# Switching to single index columns:
df.columns = [' '.join(col).strip() for col in df.columns]
sorted_col_date = ["Route", "Time"] sorted(df.columns[2:], key = lambda x: (pd.to_datetime(x.split()[1], format="%m-%d-%y"), -ord(x.split()[0][0])))
df = df.reindex(columns=sorted_col_date)
print(df)
Output:
Route Time TS 04-01-22 R 04-01-22 TS 04-05-22 R 04-05-22 TS 04-06-22 R 04-06-22 TS 04-07-22 R 04-07-22 TS 04-09-22 R 04-09-22
0 SD - LA 8:00 45 210 89 293 60 252 91 310 129 470
1 SD - SF 10:00 43 200 52 240 71 320 67 300 70 310