Home > OS >  How prevent pd.pivot_table from sorting columns
How prevent pd.pivot_table from sorting columns

Time:07-19

I have the following long df:

df = pd.DataFrame({'stations':["Toronto","Toronto","Toronto","New York","New York","New York"],'forecast_date':["Jul 30","Jul 31","Aug 1","Jul 30","Jul 31","Aug 1"],'low':[58,57,59,70,72,71],'high':[65,66,64,88,87,86]})
print(df)

I want to pivot the table to a wide df that looks like this:

Desired Output

so I used the following function:

df = df.pivot_table(index = 'stations', columns = "forecast_date", values = ["high","low"],aggfunc = "first").reset_index()
print(df)

but with this, I get the following df:

Output Received (Undesired)

So basically pd.pivot_table seems to be sorting the columns alphabetically, whereas I want it to be sorted in chronological order

Any help would be appreciated,

(Note that the dates are continuously changing, so other months will have a similar problem)

CodePudding user response:

You won't be able to prevent the sorting, but you can always enforce the original ordering by using .reindex with the unique values from the column!

table = df.pivot_table(index = 'stations', columns = "forecast_date", values = ["high","low"],aggfunc = "first")

print(
    table
)
               high                 low              
forecast_date Aug 1 Jul 30 Jul 31 Aug 1 Jul 30 Jul 31
stations                                             
New York         86     88     87    71     70     72
Toronto          64     65     66    59     58     57


print(
    table.reindex(columns=df['forecast_date'].unique(), level='forecast_date')
)
                high                 low             
forecast_date Jul 30 Jul 31 Aug 1 Jul 30 Jul 31 Aug 1
stations                                             
New York          88     87    86     70     72    71
Toronto           65     66    64     58     57    59

Note that this is different than sorting in chronological order. To do that you would have to cast to a datetime and sort on that.

  • Related