Home > front end >  Pandas pivot - how to keep original sorting of index?
Pandas pivot - how to keep original sorting of index?

Time:08-20

I'm trying to pivot a dataframe in pandas to produce heatmaps (pandas version 1.4.3). The issue is that after pivoting, the original sorting of the index column is lost. Since my data represent samples from geographical locations, I need them to be sorted by latitude (which is how they are in the 'TILE' column in the example below).

mwe:

dummy = [{'TILE':'N59TE010A','METRIC':'ELD_RMSE','LOW':2},
{'TILE':'N59TE009G','METRIC':'ELD_RMSE','LOW':2},
{'TILE':'N59RE009G','METRIC':'ELD_RMSE','LOW':1},
{'TILE':'N59TE010B','METRIC':'ELD_RMSE','LOW':2},
{'TILE':'N59TE010C','METRIC':'ELD_RMSE','LOW':2},
{'TILE':'S24TW047F','METRIC':'RUF_RMSE','LOW':2},
{'TILE':'S24TW047G','METRIC':'ELD_LE90','LOW':2},
{'TILE':'S24MW047D','METRIC':'SMD_LE90','LOW':2},
{'TILE':'S24MW047C','METRIC':'RUF_RMSE','LOW':0},
{'TILE':'S24MW047D','METRIC':'RUF_RMSE','LOW':0}]
df = pd.DataFrame.from_dict(dummy)

df
        TILE    METRIC  LOW
0  N59TE010A  ELD_RMSE    2
1  N59TE009G  ELD_RMSE    2
2  N59RE009G  ELD_RMSE    1
3  N59TE010B  ELD_RMSE    2
4  N59TE010C  ELD_RMSE    2
5  S24TW047F  RUF_RMSE    2
6  S24TW047G  ELD_LE90    2
7  S24MW047D  SMD_LE90    2
8  S24MW047C  RUF_RMSE    0
9  S24MW047D  RUF_RMSE    0


df.pivot(index='TILE', columns='METRIC', values='LOW')

METRIC     ELD_LE90  ELD_RMSE  RUF_RMSE  SMD_LE90
TILE                                             
N59RE009G       NaN       1.0       NaN       NaN
N59TE009G       NaN       2.0       NaN       NaN
N59TE010A       NaN       2.0       NaN       NaN
N59TE010B       NaN       2.0       NaN       NaN
N59TE010C       NaN       2.0       NaN       NaN
S24MW047C       NaN       NaN       0.0       NaN
S24MW047D       NaN       NaN       0.0       2.0
S24TW047F       NaN       NaN       2.0       NaN
S24TW047G       2.0       NaN       NaN       NaN

Never mind the NaN values, the point is that the first row should have tile N59TE010A, and not N59RE009G (and so on).

I've been trying a few solutions I found here and elsewhere but without luck. Is there a way to preserve the sorting of the 'TILE' column?

Thanks

CodePudding user response:

You can use pivot_table that has more options, including sort=False:

df.pivot_table(index='TILE', columns='METRIC', values='LOW', sort=False)

Another option could be to add a dummy column to use as index with the desired order, using for example pandas.factorize to keep the original order.

(df.assign(idx=pd.factorize(df['TILE'])[0])
   .pivot(index=['idx', 'TILE'], columns='METRIC', values='LOW')
   .droplevel('idx')
)

output:

METRIC     ELD_LE90  ELD_RMSE  RUF_RMSE  SMD_LE90
TILE                                             
N59TE010A       NaN       2.0       NaN       NaN
N59TE009G       NaN       2.0       NaN       NaN
N59RE009G       NaN       1.0       NaN       NaN
N59TE010B       NaN       2.0       NaN       NaN
N59TE010C       NaN       2.0       NaN       NaN
S24TW047F       NaN       NaN       2.0       NaN
S24TW047G       2.0       NaN       NaN       NaN
S24MW047D       NaN       NaN       0.0       2.0
S24MW047C       NaN       NaN       0.0       NaN
  • Related