Problem adapting my dataframe to a specific form-CodePudding

So I have a very big problem I would like help with. I have a dataframe that looks like this:

Route	Day	Time	Tickets Sold	Revenue
SD - LA	04-01-22	8:00	45	210
SD - LA	04-05-22	8:00	89	293
SD - LA	04-06-22	8:00	60	252
SD - LA	04-07-22	8:00	91	310
SD - LA	04-09-22	8:00	129	470
SD - SF	04-01-22	10:00	43	200
SD - SF	04-05-22	10:00	52	240
SD - SF	04-06-22	10:00	71	320
SD - SF	04-07-22	10:00	67	300
SD - SF	04-09-22	10:00	70	310

I would like to transform it to something like this:

Route	Time	TS 04-01-22	R 04-01-22	TS 04-05-22	R 04-05-22	TS 04-06-22	R 04-06-22
SD-LA	8:00	45	210	89	293	60	252
SD-SF	10:00	43	200	52	240	71	320

Where R = Revenue and TS = Tickets Sold

I would like to include all the dates!

I know something like a reshape would work.. but honestly I don't know how to do it, because I want my columns to have different names

Thanks in advance, all help is appreciated!

I want to do this basically because I want a time series that shows me how the tickets sold and the revenue have evolved during time!

CodePudding user response：

You want to pivot the df:

header = ['Route', 'Day', 'Time', 'Tickets Sold', 'Revenue']

rows = [
    ['SD - LA', '04-01-22', '8:00', '45', '210'],
    ['SD - LA', '04-05-22', '8:00', '89', '293'],
    ['SD - LA', '04-06-22', '8:00', '60', '252'],
    ['SD - LA', '04-07-22', '8:00', '91', '310'],
    ['SD - LA', '04-09-22', '8:00', '129', '470'],
    ['SD - SF', '04-01-22', '10:00', '43', '200'],
    ['SD - SF', '04-05-22', '10:00', '52', '240'],
    ['SD - SF', '04-06-22', '10:00', '71', '320'],
    ['SD - SF', '04-07-22', '10:00', '67', '300'],
    ['SD - SF', '04-09-22', '10:00', '70', '310']
]

# store your data as df
df = pd.DataFrame(
    rows, 
    columns=header
)

# PIVOT!!!
df = df.pivot(
    values = ['Tickets Sold', 'Revenue'],
    index=['Route', 'Time'], 
    columns = 'Day'
).reset_index()

# rename columns
df.columns = [' '.join(c) for c in df.columns]

df

CodePudding user response：

Here a way to get the columns ordered correctly (since the order of your dates in format %m-%d-%y might be messed up at year change):

df.rename({"Revenue": "R", "Tickets Sold": "TS"}, axis=1, inplace=True)

df = df.pivot_table(index=["Route", "Time"], columns="Day", values=["TS", "R"]).reset_index()
# Switching to single index columns:
df.columns = [' '.join(col).strip() for col in df.columns]

sorted_col_date = ["Route", "Time"]   sorted(df.columns[2:], key = lambda x: (pd.to_datetime(x.split()[1], format="%m-%d-%y"), -ord(x.split()[0][0])))
df = df.reindex(columns=sorted_col_date)

print(df)

Output:

     Route   Time  TS 04-01-22  R 04-01-22  TS 04-05-22  R 04-05-22  TS 04-06-22  R 04-06-22  TS 04-07-22  R 04-07-22  TS 04-09-22  R 04-09-22
0  SD - LA   8:00           45         210           89         293           60         252           91         310          129         470
1  SD - SF  10:00           43         200           52         240           71         320           67         300           70         310