I need to find which row a value should placed in a sorted dataframe.
For example,
value = [3, 2.5]
df = pd.DataFrame({'x':[1,1,1,2,2,2,3,3,3], 'y':[1,2,3,1,2,3,1,2,3]})
since value has x = 3
and y = 2.5
, right place for this value will be between 8th and 9th.
I want to return 8 (lower index) in this case.
I tried to think a solution for it, but I really need help with this.
CodePudding user response:
This is one way using bisect_left()
:
from bisect import bisect_left
i = bisect_left(list(zip(df.x, df.y)), tuple(value))
print(df.index[i])
Output:
8
CodePudding user response:
You can use Series.searchsorted
, which applies also a bisection algorithm under the hood.
You can generalize it to a DataFrame with multiple columns using the following function:
value = [3, 2.5]
df = pd.DataFrame({'x':[1,1,1,2,2,2,3,3,3], 'y':[1,2,3,1,2,3,1,2,3]})
def bisect(df, arr, side='left', sorter=None):
row_idx = 0
for col_idx, val in zip(range(df.shape[1]), arr):
row_idx = df.iloc[row_idx:, col_idx].searchsorted(val, side, sorter)
return row_idx
Output:
>>> bisect(df, value)
8