Hi I want to keep the column infoid order unchanged but sort date in increasing order(acsending) Is that possible?
statisticsdate infoid
20230108 46726004
20230106 46726004
20230108 46725082
20230107 46725082
20230108 46725081
20230108 46724162
20230108 46720662
should be like:
statisticsdate infoid
20230106 46726004
20230108 46726004
20230107 46725082
20230108 46725082
20230108 46725081
20230108 46724162
20230108 46720662
CodePudding user response:
This should do the trick:
>>> df.groupby('id').transform(lambda x: x.sort_values()).join(df['id'])
date id
0 20230106 46726004
1 20230108 46726004
2 20230107 46725082
3 20230108 46725082
4 20230108 46725081
5 20230108 46724162
6 20230108 46720662
Or without a join:
>>> df.set_index('id').groupby('id').transform(lambda x: x.sort_values()).reset_index()
id date
0 46726004 20230106
1 46726004 20230108
2 46725082 20230107
3 46725082 20230108
4 46725081 20230108
5 46724162 20230108
6 46720662 20230108
CodePudding user response:
If possible sorting infoid
descending and infoid
ascending use DataFrame.sort_values
only:
df = df.sort_values(['infoid','statisticsdate'], ascending=[False, True], ignore_index=True)
print (df)
statisticsdate infoid
0 20230106 46726004
1 20230108 46726004
2 20230107 46725082
3 20230108 46725082
4 20230108 46725081
5 20230108 46724162
6 20230108 46720662
First idea is sorting per groups by custom lambda function with sort=False
parameter for no sorting groups, solution is slow if larger DataFrame:
df['statisticsdate'] = df.groupby('infoid', sort=False, group_keys=False)['statisticsdate'].apply(lambda x: x.sort_values()).to_numpy()
print (df)
statisticsdate infoid
0 20230106 46726004
1 20230108 46726004
2 20230107 46725082
3 20230108 46725082
4 20230108 46725081
5 20230108 46724162
6 20230108 46720662
Or you can convert infoid
to ordered Categorical and sorting by both columns:
df['statisticsdate'] = df.assign(infoid = pd.Categorical(df['infoid'], ordered=True, categories=df['infoid'].unique())).sort_values(['infoid','statisticsdate'])['statisticsdate']
print (df)
statisticsdate infoid
0 20230108 46726004
1 20230106 46726004
2 20230108 46725082
3 20230107 46725082
4 20230108 46725081
5 20230108 46724162
6 20230108 46720662
If soe groups are not sorted is necessary use:
print (df)
statisticsdate infoid
0 20230108 46726004
1 20230106 46726004
2 20230108 46725082
3 20230107 46725082
4 20230108 46725081
5 20230108 46724162
6 20230108 46720662
7 20230108 46726004 <- not sorted group 46726004
8 20230106 46726004
df['statisticsdate'] = (df.assign(infoid = df['infoid'].ne(df['infoid'].shift()).cumsum())
.sort_values(['infoid','statisticsdate'], ignore_index=True)['statisticsdate'])
print (df)
statisticsdate infoid
0 20230106 46726004
1 20230108 46726004
2 20230107 46725082
3 20230108 46725082
4 20230108 46725081
5 20230108 46724162
6 20230108 46720662
7 20230106 46726004
8 20230108 46726004