Home > Mobile >  How to find last occurrence of value meeting condition in column in python
How to find last occurrence of value meeting condition in column in python

Time:10-04

I have the following dataframe:

df = pd.DataFrame({"A":['a','b','c','d','e','f','g','h','i','j','k'],
                    "B":[1,3,4,5,6,7,6,5,8,5,5]})
df

displayed as:

    A   B
0   a   1
1   b   3
2   c   4
3   d   5
4   e   6
5   f   7
6   g   6
7   h   5
8   i   8
9   j   5
10  k   5

I first want to find the letter in column "A" that corresponds to the first occurrence of a value in column "B" that is >= 6. Looking at this, we see that this would be row index 4, corresponding to a value of 6 and "e" in column "A".

I can identify the column "A" value we just got with this code:

#Find first occurrence >= threshold
threshold = 6
array = df.values
array[np.where(array[:,1] >= threshold)][0,0]

This code returns 'e', which is what I want.

This code is referenced from this Stack Overflow source: Python find first occurrence in Pandas dataframe column 2 below threshold and return column 1 value same row using NumPy

What I am having trouble figuring out is how to modify this code to find the last occurrence meeting my criteria of being >= the threshold of 6. And so looking at my code above, I want to produce 'i', because looking at the above data frame, the row containing "i" in column "A" correspond to a value of 8 in column "B", which is the last occurrence of a value >= the threshold of 6. I want to preserve the order of the rows as alphabetical referencing column "A". I am guessing this might have to do with somehow modifying the indexing in my code, specifically the array[:,1] component or the [0,0] component, but I am not sure how to specifically call for the last occurrence meeting my criteria. How can I modify my code to find the value in column "A" corresponding to the last occurrence of a value >= the threshold of 6 in column "B"?

CodePudding user response:

To get the first occurrence, You can use idxmax:

df.loc[df['B'].ge(6).idxmax()]

output:

A    e
B    6
Name: 4, dtype: object

For just the value in 'A':

df.loc[df['B'].ge(6).idxmax(), 'A']

output: 'e'

For the last, do the same on the reversed Series:

df.loc[df.loc[::-1,'B'].ge(6).idxmax()]

output:

A    k
B    8
Name: 10, dtype: object
df.loc[df.loc[::-1, 'B'].ge(6).idxmax(), 'A']

output: 'k'

CodePudding user response:

here is one way to do it

search for the rows meeting your criteria and then get the values from the bottom of the resultset

df.loc[df['B'] >=6][-1:]

in text dataframe

    A   B
8   i   8

in dataframe code

A   B
10  k   8
  • Related