I have the following long df:
df = pd.DataFrame({'stations':["Toronto","Toronto","Toronto","New York","New York","New York"],'forecast_date':["Jul 30","Jul 31","Aug 1","Jul 30","Jul 31","Aug 1"],'low':[58,57,59,70,72,71],'high':[65,66,64,88,87,86]})
print(df)
I want to pivot the table to a wide df that looks like this:
so I used the following function:
df = df.pivot_table(index = 'stations', columns = "forecast_date", values = ["high","low"],aggfunc = "first").reset_index()
print(df)
but with this, I get the following df:
So basically pd.pivot_table seems to be sorting the columns alphabetically, whereas I want it to be sorted in chronological order
Any help would be appreciated,
(Note that the dates are continuously changing, so other months will have a similar problem)
CodePudding user response:
You won't be able to prevent the sorting, but you can always enforce the original ordering by using .reindex
with the unique values from the column!
table = df.pivot_table(index = 'stations', columns = "forecast_date", values = ["high","low"],aggfunc = "first")
print(
table
)
high low
forecast_date Aug 1 Jul 30 Jul 31 Aug 1 Jul 30 Jul 31
stations
New York 86 88 87 71 70 72
Toronto 64 65 66 59 58 57
print(
table.reindex(columns=df['forecast_date'].unique(), level='forecast_date')
)
high low
forecast_date Jul 30 Jul 31 Aug 1 Jul 30 Jul 31 Aug 1
stations
New York 88 87 86 70 72 71
Toronto 65 66 64 58 57 59
Note that this is different than sorting in chronological order. To do that you would have to cast to a datetime
and sort on that.