Python: How to generate two new columns in a pandas df, based on the order of the values?-CodePudding

I have the following table as input:

	x	y
0	-0.872803	137.097977
1	-0.418766	821.549805
2	-0.657833	712.427856
3	-0.922091	126.871956
4	-0.847130	217.126068
5	0.692070	2166.090820
6	-0.858773	297.893188
7	-0.466285	634.510315
8	-0.774720	91.447876
9	-0.111050	1200.390625
10	0.325138	1759.597900

And I need to generate something like this:

	x	y	pos_when_sorted_by_x	pos_when_sorted_by_y
0	-0.872803	137.097977	9	8
1	-0.418766	821.549805	3	3
2	-0.657833	712.427856	5	4
3	-0.922091	126.871956	10	9
4	-0.847130	217.126068	7	7
5	0.692070	2166.090820	0	0
6	-0.858773	297.893188	8	6
7	-0.466285	634.510315	4	5
8	-0.774720	91.447876	6	10
9	-0.111050	1200.390625	2	2
10	0.325138	1759.597900	1	1

pos_when_sorted_by_x and pos_when_sorted_by_y are based on the location in the sorted dataframe by each of these columns.

CodePudding user response：

Use rank:

df[['x_pos', 'y_pos']] = df.agg('rank', ascending=False).sub(1).astype(int)
print(df)

# Output:
           x            y  x_pos  y_pos
0  -0.872803   137.097977      9      8
1  -0.418766   821.549805      3      3
2  -0.657833   712.427856      5      4
3  -0.922091   126.871956     10      9
4  -0.847130   217.126068      7      7
5   0.692070  2166.090820      0      0
6  -0.858773   297.893188      8      6
7  -0.466285   634.510315      4      5
8  -0.774720    91.447876      6     10
9  -0.111050  1200.390625      2      2
10  0.325138  1759.597900      1      1

An alternative with numpy and argsort:

df[['x_pos', 'y_pos']] = np.argsort(np.argsort(-1*df, axis=0), axis=0)
print(df)

# Output:
           x            y  x_pos  y_pos
0  -0.872803   137.097977      9      8
1  -0.418766   821.549805      3      3
2  -0.657833   712.427856      5      4
3  -0.922091   126.871956     10      9
4  -0.847130   217.126068      7      7
5   0.692070  2166.090820      0      0
6  -0.858773   297.893188      8      6
7  -0.466285   634.510315      4      5
8  -0.774720    91.447876      6     10
9  -0.111050  1200.390625      2      2
10  0.325138  1759.597900      1      1

Note: -1* is because argsort have no option to descending order.

CodePudding user response：

You can use pd.rank with ascending=False and subtract 1 so the rank starts at zero.

import pandas as pd
df = pd.DataFrame({'x': [-0.872803,
  -0.418766,
  -0.657833,
  -0.922091,
  -0.84713,
  0.69207,
  -0.858773,
  -0.466285,
  -0.77472,
  -0.11105,
  0.325138],
 'y': [137.097977,
  821.549805,
  712.427856,
  126.871956,
  217.126068,
  2166.09082,
  297.893188,
  634.510315,
  91.447876,
  1200.390625,
  1759.5979]})

df['pos_x'] = (df.x.rank(ascending=False)-1).astype(int)
df['pos_y'] = (df.y.rank(ascending=False)-1).astype(int)

Output

           x            y  pos_x  pos_y
0  -0.872803   137.097977      9      8
1  -0.418766   821.549805      3      3
2  -0.657833   712.427856      5      4
3  -0.922091   126.871956     10      9
4  -0.847130   217.126068      7      7
5   0.692070  2166.090820      0      0
6  -0.858773   297.893188      8      6
7  -0.466285   634.510315      4      5
8  -0.774720    91.447876      6     10
9  -0.111050  1200.390625      2      2
10  0.325138  1759.597900      1      1

CodePudding user response：

You can do the following too:

dfs_x = df.sort_values(by='x', ascending=False)
dfs_y = df.sort_values(by='y', ascending=False)
df['pos_x'] = df.index.map(dfs_x.index.get_loc)
df['pos_y'] = df.index.map(dfs_y.index.get_loc)