I'm trying to pivot a dataframe in pandas to produce heatmaps (pandas version 1.4.3). The issue is that after pivoting, the original sorting of the index column is lost. Since my data represent samples from geographical locations, I need them to be sorted by latitude (which is how they are in the 'TILE' column in the example below).
mwe:
dummy = [{'TILE':'N59TE010A','METRIC':'ELD_RMSE','LOW':2},
{'TILE':'N59TE009G','METRIC':'ELD_RMSE','LOW':2},
{'TILE':'N59RE009G','METRIC':'ELD_RMSE','LOW':1},
{'TILE':'N59TE010B','METRIC':'ELD_RMSE','LOW':2},
{'TILE':'N59TE010C','METRIC':'ELD_RMSE','LOW':2},
{'TILE':'S24TW047F','METRIC':'RUF_RMSE','LOW':2},
{'TILE':'S24TW047G','METRIC':'ELD_LE90','LOW':2},
{'TILE':'S24MW047D','METRIC':'SMD_LE90','LOW':2},
{'TILE':'S24MW047C','METRIC':'RUF_RMSE','LOW':0},
{'TILE':'S24MW047D','METRIC':'RUF_RMSE','LOW':0}]
df = pd.DataFrame.from_dict(dummy)
df
TILE METRIC LOW
0 N59TE010A ELD_RMSE 2
1 N59TE009G ELD_RMSE 2
2 N59RE009G ELD_RMSE 1
3 N59TE010B ELD_RMSE 2
4 N59TE010C ELD_RMSE 2
5 S24TW047F RUF_RMSE 2
6 S24TW047G ELD_LE90 2
7 S24MW047D SMD_LE90 2
8 S24MW047C RUF_RMSE 0
9 S24MW047D RUF_RMSE 0
df.pivot(index='TILE', columns='METRIC', values='LOW')
METRIC ELD_LE90 ELD_RMSE RUF_RMSE SMD_LE90
TILE
N59RE009G NaN 1.0 NaN NaN
N59TE009G NaN 2.0 NaN NaN
N59TE010A NaN 2.0 NaN NaN
N59TE010B NaN 2.0 NaN NaN
N59TE010C NaN 2.0 NaN NaN
S24MW047C NaN NaN 0.0 NaN
S24MW047D NaN NaN 0.0 2.0
S24TW047F NaN NaN 2.0 NaN
S24TW047G 2.0 NaN NaN NaN
Never mind the NaN values, the point is that the first row should have tile N59TE010A, and not N59RE009G (and so on).
I've been trying a few solutions I found here and elsewhere but without luck. Is there a way to preserve the sorting of the 'TILE' column?
Thanks
CodePudding user response:
You can use pivot_table
that has more options, including sort=False
:
df.pivot_table(index='TILE', columns='METRIC', values='LOW', sort=False)
Another option could be to add a dummy column to use as index with the desired order, using for example pandas.factorize
to keep the original order.
(df.assign(idx=pd.factorize(df['TILE'])[0])
.pivot(index=['idx', 'TILE'], columns='METRIC', values='LOW')
.droplevel('idx')
)
output:
METRIC ELD_LE90 ELD_RMSE RUF_RMSE SMD_LE90
TILE
N59TE010A NaN 2.0 NaN NaN
N59TE009G NaN 2.0 NaN NaN
N59RE009G NaN 1.0 NaN NaN
N59TE010B NaN 2.0 NaN NaN
N59TE010C NaN 2.0 NaN NaN
S24TW047F NaN NaN 2.0 NaN
S24TW047G 2.0 NaN NaN NaN
S24MW047D NaN NaN 0.0 2.0
S24MW047C NaN NaN 0.0 NaN