Home > Enterprise >  Python: How to generate two new columns in a pandas df, based on the order of the values?
Python: How to generate two new columns in a pandas df, based on the order of the values?

Time:12-17

I have the following table as input:

x y
0 -0.872803 137.097977
1 -0.418766 821.549805
2 -0.657833 712.427856
3 -0.922091 126.871956
4 -0.847130 217.126068
5 0.692070 2166.090820
6 -0.858773 297.893188
7 -0.466285 634.510315
8 -0.774720 91.447876
9 -0.111050 1200.390625
10 0.325138 1759.597900

And I need to generate something like this:

x y pos_when_sorted_by_x pos_when_sorted_by_y
0 -0.872803 137.097977 9 8
1 -0.418766 821.549805 3 3
2 -0.657833 712.427856 5 4
3 -0.922091 126.871956 10 9
4 -0.847130 217.126068 7 7
5 0.692070 2166.090820 0 0
6 -0.858773 297.893188 8 6
7 -0.466285 634.510315 4 5
8 -0.774720 91.447876 6 10
9 -0.111050 1200.390625 2 2
10 0.325138 1759.597900 1 1

pos_when_sorted_by_x and pos_when_sorted_by_y are based on the location in the sorted dataframe by each of these columns.

CodePudding user response:

Use rank:

df[['x_pos', 'y_pos']] = df.agg('rank', ascending=False).sub(1).astype(int)
print(df)

# Output:
           x            y  x_pos  y_pos
0  -0.872803   137.097977      9      8
1  -0.418766   821.549805      3      3
2  -0.657833   712.427856      5      4
3  -0.922091   126.871956     10      9
4  -0.847130   217.126068      7      7
5   0.692070  2166.090820      0      0
6  -0.858773   297.893188      8      6
7  -0.466285   634.510315      4      5
8  -0.774720    91.447876      6     10
9  -0.111050  1200.390625      2      2
10  0.325138  1759.597900      1      1

An alternative with numpy and argsort:

df[['x_pos', 'y_pos']] = np.argsort(np.argsort(-1*df, axis=0), axis=0)
print(df)

# Output:
           x            y  x_pos  y_pos
0  -0.872803   137.097977      9      8
1  -0.418766   821.549805      3      3
2  -0.657833   712.427856      5      4
3  -0.922091   126.871956     10      9
4  -0.847130   217.126068      7      7
5   0.692070  2166.090820      0      0
6  -0.858773   297.893188      8      6
7  -0.466285   634.510315      4      5
8  -0.774720    91.447876      6     10
9  -0.111050  1200.390625      2      2
10  0.325138  1759.597900      1      1

Note: -1* is because argsort have no option to descending order.

CodePudding user response:

You can use pd.rank with ascending=False and subtract 1 so the rank starts at zero.

import pandas as pd
df = pd.DataFrame({'x': [-0.872803,
  -0.418766,
  -0.657833,
  -0.922091,
  -0.84713,
  0.69207,
  -0.858773,
  -0.466285,
  -0.77472,
  -0.11105,
  0.325138],
 'y': [137.097977,
  821.549805,
  712.427856,
  126.871956,
  217.126068,
  2166.09082,
  297.893188,
  634.510315,
  91.447876,
  1200.390625,
  1759.5979]})

df['pos_x'] = (df.x.rank(ascending=False)-1).astype(int)
df['pos_y'] = (df.y.rank(ascending=False)-1).astype(int)

Output

           x            y  pos_x  pos_y
0  -0.872803   137.097977      9      8
1  -0.418766   821.549805      3      3
2  -0.657833   712.427856      5      4
3  -0.922091   126.871956     10      9
4  -0.847130   217.126068      7      7
5   0.692070  2166.090820      0      0
6  -0.858773   297.893188      8      6
7  -0.466285   634.510315      4      5
8  -0.774720    91.447876      6     10
9  -0.111050  1200.390625      2      2
10  0.325138  1759.597900      1      1

CodePudding user response:

You can do the following too:

dfs_x = df.sort_values(by='x', ascending=False)
dfs_y = df.sort_values(by='y', ascending=False)
df['pos_x'] = df.index.map(dfs_x.index.get_loc)
df['pos_y'] = df.index.map(dfs_y.index.get_loc)
  • Related