I have a dataframe which consists of two columns:
x y
0 1 2
1 2 4
2 3 6
3 4 8
4 5 10
5 6 12
6 7 14
7 8 16
8 9 18
9 10 20
I would like to add a column whose value is the index of the first value to meet the following condition: y >= x
. For example, for row 2 (x = 3
), the first y
value greater or equal to 3 is 4 so the output of z
for row 2 is (index) 1. I expect the final table to look like:
x y z
0 1 2 0
1 2 4 0
2 3 6 1
3 4 8 1
4 5 10 2
5 6 12 2
6 7 14 3
7 8 16 3
8 9 18 4
9 10 20 4
It should be noted that both x
and y
are sorted if that should make the solution easier.
I have seen a similar answer but I could not translate it to my situation.
CodePudding user response:
You want np.searchsorted
, which assumes df['y']
is sorted:
df['z'] = np.searchsorted(df['y'], df['x'])
Output:
x y z
0 1 2 0
1 2 4 0
2 3 6 1
3 4 8 1
4 5 10 2
5 6 12 2
6 7 14 3
7 8 16 3
8 9 18 4
9 10 20 4