I have a dataframe as below:
0.1 0.65
0.2 0.664
0.3 0.606
0.4 0.587
0.5 0.602
0.6 0.59
0.7 0.53
I have to find the first occurence below 0.6 in column 2 and return the value of the column 1 on same row. In that example the returned value would be 0.4.
How could I do this using Numpy or SciPy ?
the code is:
import pandas as pd
df = pd.DataFrame([*zip([0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7], [0.65, 0.664, 0.606 ,0.587 ,0.602,0.59,0.53])])
threshold = 0.6
var = df[df[1] < threshold].head(1)[0]
res = var.iloc[0]
CodePudding user response:
You can use masking and the df.head()
function to get the first occurrence given the threshold.
df[df[1] < threshold].head(1)[0]
3 0.4
Name: 0, dtype: float64
Update
To use numpy, you need to convert the pandas to numpy and use np.where
.
array = df.values
array[np.where(array[:,1] < 0.6)][0,0]
0.4
To compare the performance, we will time the two sets of codes.
# Pandas style
def function1(df):
return df[df[1] < threshold].head(1)[0]
# Numpy style
def function2(df):
array = df.values
return array[np.where(array[:,1] < 0.6)][0,0]
%timeit function1(df)
322 µs ± 6.71 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
%timeit function2(df)
11.8 µs ± 209 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)