Home > Software design >  Python pandas keep first columns' order unchanged while second col sort by ascending order
Python pandas keep first columns' order unchanged while second col sort by ascending order

Time:01-09

Hi I want to keep the column infoid order unchanged but sort date in increasing order(acsending) Is that possible?

statisticsdate  infoid 
20230108      46726004
20230106      46726004
20230108      46725082
20230107      46725082
20230108      46725081
20230108      46724162
20230108      46720662 

should be like:

statisticsdate  infoid 
20230106      46726004
20230108      46726004
20230107      46725082
20230108      46725082
20230108      46725081
20230108      46724162
20230108      46720662 

CodePudding user response:

This should do the trick:

>>> df.groupby('id').transform(lambda x: x.sort_values()).join(df['id'])

       date        id
0  20230106  46726004
1  20230108  46726004
2  20230107  46725082
3  20230108  46725082
4  20230108  46725081
5  20230108  46724162
6  20230108  46720662

Or without a join:

>>> df.set_index('id').groupby('id').transform(lambda x: x.sort_values()).reset_index()

         id      date
0  46726004  20230106
1  46726004  20230108
2  46725082  20230107
3  46725082  20230108
4  46725081  20230108
5  46724162  20230108
6  46720662  20230108

CodePudding user response:

If possible sorting infoid descending and infoid ascending use DataFrame.sort_values only:

df = df.sort_values(['infoid','statisticsdate'], ascending=[False, True], ignore_index=True)
print (df)
   statisticsdate    infoid
0        20230106  46726004
1        20230108  46726004
2        20230107  46725082
3        20230108  46725082
4        20230108  46725081
5        20230108  46724162
6        20230108  46720662

First idea is sorting per groups by custom lambda function with sort=False parameter for no sorting groups, solution is slow if larger DataFrame:

df['statisticsdate'] = df.groupby('infoid', sort=False, group_keys=False)['statisticsdate'].apply(lambda x: x.sort_values()).to_numpy()
print (df)
   statisticsdate    infoid
0        20230106  46726004
1        20230108  46726004
2        20230107  46725082
3        20230108  46725082
4        20230108  46725081
5        20230108  46724162
6        20230108  46720662

Or you can convert infoid to ordered Categorical and sorting by both columns:

df['statisticsdate'] = df.assign(infoid = pd.Categorical(df['infoid'], ordered=True, categories=df['infoid'].unique())).sort_values(['infoid','statisticsdate'])['statisticsdate']
print (df)
   statisticsdate    infoid
0        20230108  46726004
1        20230106  46726004
2        20230108  46725082
3        20230107  46725082
4        20230108  46725081
5        20230108  46724162
6        20230108  46720662

If soe groups are not sorted is necessary use:

print (df)
   statisticsdate    infoid
0        20230108  46726004
1        20230106  46726004
2        20230108  46725082
3        20230107  46725082
4        20230108  46725081
5        20230108  46724162
6        20230108  46720662
7        20230108  46726004 <- not sorted group 46726004
8        20230106  46726004


df['statisticsdate'] = (df.assign(infoid = df['infoid'].ne(df['infoid'].shift()).cumsum())
                          .sort_values(['infoid','statisticsdate'], ignore_index=True)['statisticsdate'])
print (df)
   statisticsdate    infoid
0        20230106  46726004
1        20230108  46726004
2        20230107  46725082
3        20230108  46725082
4        20230108  46725081
5        20230108  46724162
6        20230108  46720662
7        20230106  46726004
8        20230108  46726004
  • Related