I have the following example dataframe:
d = {'target': [1, 2, 4, 3, 6, 5]}
df = pd.DataFrame(data=d)
df
Output:
target
0 1
1 2
2 4
3 3
4 6
5 5
I need a function that will do the following:
Let the function have the name find_index_of_first_hit(value)
.
The function...
- will be comparing the function input
value
with elements of the columntarget
. - will search for the first column value that is greater than or equal to the function input
value
. - and will return the
index
of the dataframe row for the very first match.
Example:
find_index_of_first_hit(3)
Should return 2
which is the index of the target
column value 4, which is where the column value (which is 4) is >= the function input value 3 for the first time in the column. And the index is 2, which is expected to be returned.
- The function is expected to return -1 if none of column values are >= the function input value.
The original dataframe is fairly large and I wonder how I can write such a program without using for loop.
This function needs to be written in Python and it needs to be a fast solution, which is why I would like to avoid for loop. Performance is important here.
How can I write such a Python function doing this work?
CodePudding user response:
Use Series.idxmax
with test if value exist in if-else
with Series.any
:
def find_index_of_first_hit(val):
a = df['target'].ge(val)
return a.idxmax() if a.any() else -1
print (find_index_of_first_hit(3))
2
print (find_index_of_first_hit(30))
-1
CodePudding user response:
use an equality check .eq
with idxmax
You'll find you rarely need to write any functions for Pandas (unless you need to package up reusable code snippets) as most things are available in the API.
index = df.ge(3).idxmax()
target 2
dtype: int64