I have the following table as input:
x | y | |
---|---|---|
0 | -0.872803 | 137.097977 |
1 | -0.418766 | 821.549805 |
2 | -0.657833 | 712.427856 |
3 | -0.922091 | 126.871956 |
4 | -0.847130 | 217.126068 |
5 | 0.692070 | 2166.090820 |
6 | -0.858773 | 297.893188 |
7 | -0.466285 | 634.510315 |
8 | -0.774720 | 91.447876 |
9 | -0.111050 | 1200.390625 |
10 | 0.325138 | 1759.597900 |
And I need to generate something like this:
x | y | pos_when_sorted_by_x | pos_when_sorted_by_y | |
---|---|---|---|---|
0 | -0.872803 | 137.097977 | 9 | 8 |
1 | -0.418766 | 821.549805 | 3 | 3 |
2 | -0.657833 | 712.427856 | 5 | 4 |
3 | -0.922091 | 126.871956 | 10 | 9 |
4 | -0.847130 | 217.126068 | 7 | 7 |
5 | 0.692070 | 2166.090820 | 0 | 0 |
6 | -0.858773 | 297.893188 | 8 | 6 |
7 | -0.466285 | 634.510315 | 4 | 5 |
8 | -0.774720 | 91.447876 | 6 | 10 |
9 | -0.111050 | 1200.390625 | 2 | 2 |
10 | 0.325138 | 1759.597900 | 1 | 1 |
pos_when_sorted_by_x
and pos_when_sorted_by_y
are based on the location in the sorted dataframe by each of these columns.
CodePudding user response:
Use rank
:
df[['x_pos', 'y_pos']] = df.agg('rank', ascending=False).sub(1).astype(int)
print(df)
# Output:
x y x_pos y_pos
0 -0.872803 137.097977 9 8
1 -0.418766 821.549805 3 3
2 -0.657833 712.427856 5 4
3 -0.922091 126.871956 10 9
4 -0.847130 217.126068 7 7
5 0.692070 2166.090820 0 0
6 -0.858773 297.893188 8 6
7 -0.466285 634.510315 4 5
8 -0.774720 91.447876 6 10
9 -0.111050 1200.390625 2 2
10 0.325138 1759.597900 1 1
An alternative with numpy and argsort
:
df[['x_pos', 'y_pos']] = np.argsort(np.argsort(-1*df, axis=0), axis=0)
print(df)
# Output:
x y x_pos y_pos
0 -0.872803 137.097977 9 8
1 -0.418766 821.549805 3 3
2 -0.657833 712.427856 5 4
3 -0.922091 126.871956 10 9
4 -0.847130 217.126068 7 7
5 0.692070 2166.090820 0 0
6 -0.858773 297.893188 8 6
7 -0.466285 634.510315 4 5
8 -0.774720 91.447876 6 10
9 -0.111050 1200.390625 2 2
10 0.325138 1759.597900 1 1
Note: -1*
is because argsort
have no option to descending order.
CodePudding user response:
You can use pd.rank
with ascending=False
and subtract 1 so the rank starts at zero.
import pandas as pd
df = pd.DataFrame({'x': [-0.872803,
-0.418766,
-0.657833,
-0.922091,
-0.84713,
0.69207,
-0.858773,
-0.466285,
-0.77472,
-0.11105,
0.325138],
'y': [137.097977,
821.549805,
712.427856,
126.871956,
217.126068,
2166.09082,
297.893188,
634.510315,
91.447876,
1200.390625,
1759.5979]})
df['pos_x'] = (df.x.rank(ascending=False)-1).astype(int)
df['pos_y'] = (df.y.rank(ascending=False)-1).astype(int)
Output
x y pos_x pos_y
0 -0.872803 137.097977 9 8
1 -0.418766 821.549805 3 3
2 -0.657833 712.427856 5 4
3 -0.922091 126.871956 10 9
4 -0.847130 217.126068 7 7
5 0.692070 2166.090820 0 0
6 -0.858773 297.893188 8 6
7 -0.466285 634.510315 4 5
8 -0.774720 91.447876 6 10
9 -0.111050 1200.390625 2 2
10 0.325138 1759.597900 1 1
CodePudding user response:
You can do the following too:
dfs_x = df.sort_values(by='x', ascending=False)
dfs_y = df.sort_values(by='y', ascending=False)
df['pos_x'] = df.index.map(dfs_x.index.get_loc)
df['pos_y'] = df.index.map(dfs_y.index.get_loc)