I have a dataframe where, for each column, I need to find the index of the first value less than 5. I found a solution that works here for an individual list/series, but can't apply it to an entire dataframe in a 'pythony' way without breaking each column into its own variable via a for loop.
Data = pd.DataFrame({0: [2,2,3,2,2,3,5],
1: [8,7,7,8,7,7,7],
2: [9,7,7,4,4,4,9]})
The desired output would be [0,999,3] where 999 is a flag for not finding a value < 5. This code works for an individual series/list:
next((x for x, val in enumerate(DataS) if val < 5),999)
but when I try to apply this over all the columns I can't get it to work:
Data.apply(lambda x: next((x for x, val in enumerate(Data) if val < 5),999))
This code returns a value of 0 for every column. Can someone help me understand why apply/lambda aren't behaving how I think they should?
As a bonus, this function also appears to skip over nan values. Is there a different way to write this to flag nans?
CodePudding user response:
Let us use idxmax
to find the index of values < 5
, then mask the index with 999
if there is no such value < 5
m = df.lt(5)
m.idxmax().where(m.any(), 999)
0 0
1 999
2 3
dtype: int64