Home > Blockchain >  how to find which row a value should be placed in a sorted dataframe
how to find which row a value should be placed in a sorted dataframe

Time:06-21

I need to find which row a value should placed in a sorted dataframe.

For example,

value = [3, 2.5]

df = pd.DataFrame({'x':[1,1,1,2,2,2,3,3,3], 'y':[1,2,3,1,2,3,1,2,3]})

since value has x = 3 and y = 2.5, right place for this value will be between 8th and 9th.

I want to return 8 (lower index) in this case.

I tried to think a solution for it, but I really need help with this.

CodePudding user response:

This is one way using bisect_left():

from bisect import bisect_left
i = bisect_left(list(zip(df.x, df.y)), tuple(value))
print(df.index[i])

Output:

8

CodePudding user response:

You can use Series.searchsorted, which applies also a bisection algorithm under the hood.

You can generalize it to a DataFrame with multiple columns using the following function:

value = [3, 2.5]
df = pd.DataFrame({'x':[1,1,1,2,2,2,3,3,3], 'y':[1,2,3,1,2,3,1,2,3]})

def bisect(df, arr, side='left', sorter=None):
    row_idx = 0
    for col_idx, val in zip(range(df.shape[1]), arr):
        row_idx  = df.iloc[row_idx:, col_idx].searchsorted(val, side, sorter)
    return row_idx 

Output:

>>> bisect(df, value)
8
  • Related