Home > Software engineering >  Python find first occurrence in Pandas dataframe column 2 below threshold and return column 1 value
Python find first occurrence in Pandas dataframe column 2 below threshold and return column 1 value

Time:09-18

I have a dataframe as below:

0.1   0.65
0.2   0.664
0.3   0.606
0.4   0.587
0.5   0.602
0.6   0.59
0.7   0.53

I have to find the first occurence below 0.6 in column 2 and return the value of the column 1 on same row. In that example the returned value would be 0.4.

How could I do this using Numpy or SciPy ?

the code is:

import pandas as pd

df = pd.DataFrame([*zip([0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7], [0.65, 0.664, 0.606 ,0.587 ,0.602,0.59,0.53])])

threshold = 0.6
var = df[df[1] < threshold].head(1)[0]
res = var.iloc[0]
    

CodePudding user response:

You can use masking and the df.head() function to get the first occurrence given the threshold.

df[df[1] < threshold].head(1)[0]

3    0.4
Name: 0, dtype: float64

Update

To use numpy, you need to convert the pandas to numpy and use np.where.

array = df.values

array[np.where(array[:,1] < 0.6)][0,0]
0.4

To compare the performance, we will time the two sets of codes.

# Pandas style
def function1(df):
    return df[df[1] < threshold].head(1)[0]

# Numpy style
def function2(df):
    array = df.values

    return array[np.where(array[:,1] < 0.6)][0,0]

%timeit function1(df)
322 µs ± 6.71 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

%timeit function2(df)
11.8 µs ± 209 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
  • Related