Home > Back-end >  How to compare given value with Pandas dataframe values without using a for loop
How to compare given value with Pandas dataframe values without using a for loop

Time:02-10

I have the following example dataframe:

d = {'target': [1, 2, 4, 3, 6, 5]}
df = pd.DataFrame(data=d)
df

Output:

    target
0   1
1   2
2   4
3   3
4   6
5   5

I need a function that will do the following:

Let the function have the name find_index_of_first_hit(value).

The function...

  • will be comparing the function input value with elements of the column target.
  • will search for the first column value that is greater than or equal to the function input value.
  • and will return the index of the dataframe row for the very first match.

Example:

find_index_of_first_hit(3)

Should return 2 which is the index of the target column value 4, which is where the column value (which is 4) is >= the function input value 3 for the first time in the column. And the index is 2, which is expected to be returned.

  • The function is expected to return -1 if none of column values are >= the function input value.

The original dataframe is fairly large and I wonder how I can write such a program without using for loop.

This function needs to be written in Python and it needs to be a fast solution, which is why I would like to avoid for loop. Performance is important here.

How can I write such a Python function doing this work?

CodePudding user response:

Use Series.idxmax with test if value exist in if-else with Series.any:

def find_index_of_first_hit(val):
    a = df['target'].ge(val)
    return a.idxmax() if a.any() else -1

print (find_index_of_first_hit(3))
2
print (find_index_of_first_hit(30))
-1

CodePudding user response:

use an equality check .eq with idxmax

You'll find you rarely need to write any functions for Pandas (unless you need to package up reusable code snippets) as most things are available in the API.

index = df.ge(3).idxmax()

target    2
dtype: int64
  • Related