Home > Software design >  Find index of first value less than threshold for all columns
Find index of first value less than threshold for all columns

Time:07-12

I have a dataframe where, for each column, I need to find the index of the first value less than 5. I found a solution that works here for an individual list/series, but can't apply it to an entire dataframe in a 'pythony' way without breaking each column into its own variable via a for loop.

Data = pd.DataFrame({0: [2,2,3,2,2,3,5],
    1: [8,7,7,8,7,7,7],
    2: [9,7,7,4,4,4,9]})

The desired output would be [0,999,3] where 999 is a flag for not finding a value < 5. This code works for an individual series/list:

next((x for x, val in enumerate(DataS) if val < 5),999)

but when I try to apply this over all the columns I can't get it to work:

Data.apply(lambda x: next((x for x, val in enumerate(Data) if val < 5),999))

This code returns a value of 0 for every column. Can someone help me understand why apply/lambda aren't behaving how I think they should?

As a bonus, this function also appears to skip over nan values. Is there a different way to write this to flag nans?

CodePudding user response:

Let us use idxmax to find the index of values < 5, then mask the index with 999 if there is no such value < 5

m = df.lt(5)
m.idxmax().where(m.any(), 999)

0      0
1    999
2      3
dtype: int64
  • Related