Home > Software engineering >  Problem adapting my dataframe to a specific form
Problem adapting my dataframe to a specific form

Time:05-05

So I have a very big problem I would like help with. I have a dataframe that looks like this:

Route Day Time Tickets Sold Revenue
SD - LA 04-01-22 8:00 45 210
SD - LA 04-05-22 8:00 89 293
SD - LA 04-06-22 8:00 60 252
SD - LA 04-07-22 8:00 91 310
SD - LA 04-09-22 8:00 129 470
SD - SF 04-01-22 10:00 43 200
SD - SF 04-05-22 10:00 52 240
SD - SF 04-06-22 10:00 71 320
SD - SF 04-07-22 10:00 67 300
SD - SF 04-09-22 10:00 70 310

I would like to transform it to something like this:

Route Time TS 04-01-22 R 04-01-22 TS 04-05-22 R 04-05-22 TS 04-06-22 R 04-06-22
SD-LA 8:00 45 210 89 293 60 252
SD-SF 10:00 43 200 52 240 71 320

Where R = Revenue and TS = Tickets Sold

I would like to include all the dates!

I know something like a reshape would work.. but honestly I don't know how to do it, because I want my columns to have different names

Thanks in advance, all help is appreciated!

I want to do this basically because I want a time series that shows me how the tickets sold and the revenue have evolved during time!

CodePudding user response:

You want to pivot the df:

header = ['Route', 'Day', 'Time', 'Tickets Sold', 'Revenue']

rows = [
    ['SD - LA', '04-01-22', '8:00', '45', '210'],
    ['SD - LA', '04-05-22', '8:00', '89', '293'],
    ['SD - LA', '04-06-22', '8:00', '60', '252'],
    ['SD - LA', '04-07-22', '8:00', '91', '310'],
    ['SD - LA', '04-09-22', '8:00', '129', '470'],
    ['SD - SF', '04-01-22', '10:00', '43', '200'],
    ['SD - SF', '04-05-22', '10:00', '52', '240'],
    ['SD - SF', '04-06-22', '10:00', '71', '320'],
    ['SD - SF', '04-07-22', '10:00', '67', '300'],
    ['SD - SF', '04-09-22', '10:00', '70', '310']
]

# store your data as df
df = pd.DataFrame(
    rows, 
    columns=header
)

# PIVOT!!!
df = df.pivot(
    values = ['Tickets Sold', 'Revenue'],
    index=['Route', 'Time'], 
    columns = 'Day'
).reset_index()

# rename columns
df.columns = [' '.join(c) for c in df.columns]

df

CodePudding user response:

Here a way to get the columns ordered correctly (since the order of your dates in format %m-%d-%y might be messed up at year change):

df.rename({"Revenue": "R", "Tickets Sold": "TS"}, axis=1, inplace=True)

df = df.pivot_table(index=["Route", "Time"], columns="Day", values=["TS", "R"]).reset_index()
# Switching to single index columns:
df.columns = [' '.join(col).strip() for col in df.columns]

sorted_col_date = ["Route", "Time"]   sorted(df.columns[2:], key = lambda x: (pd.to_datetime(x.split()[1], format="%m-%d-%y"), -ord(x.split()[0][0])))
df = df.reindex(columns=sorted_col_date)

print(df)

Output:

     Route   Time  TS 04-01-22  R 04-01-22  TS 04-05-22  R 04-05-22  TS 04-06-22  R 04-06-22  TS 04-07-22  R 04-07-22  TS 04-09-22  R 04-09-22
0  SD - LA   8:00           45         210           89         293           60         252           91         310          129         470
1  SD - SF  10:00           43         200           52         240           71         320           67         300           70         310
  • Related