I have the following dataframe:
df = pd.DataFrame({"A":['a','b','c','d','e','f','g','h','i','j','k'],
"B":[1,3,4,5,6,7,6,5,8,5,5]})
df
displayed as:
A B
0 a 1
1 b 3
2 c 4
3 d 5
4 e 6
5 f 7
6 g 6
7 h 5
8 i 8
9 j 5
10 k 5
I first want to find the letter in column "A" that corresponds to the first occurrence of a value in column "B" that is >= 6. Looking at this, we see that this would be row index 4, corresponding to a value of 6 and "e" in column "A".
I can identify the column "A" value we just got with this code:
#Find first occurrence >= threshold
threshold = 6
array = df.values
array[np.where(array[:,1] >= threshold)][0,0]
This code returns 'e'
, which is what I want.
This code is referenced from this Stack Overflow source: Python find first occurrence in Pandas dataframe column 2 below threshold and return column 1 value same row using NumPy
What I am having trouble figuring out is how to modify this code to find the last occurrence meeting my criteria of being >= the threshold of 6. And so looking at my code above, I want to produce 'i'
, because looking at the above data frame, the row containing "i" in column "A" correspond to a value of 8 in column "B", which is the last occurrence of a value >= the threshold of 6. I want to preserve the order of the rows as alphabetical referencing column "A". I am guessing this might have to do with somehow modifying the indexing in my code, specifically the array[:,1]
component or the [0,0]
component, but I am not sure how to specifically call for the last occurrence meeting my criteria. How can I modify my code to find the value in column "A" corresponding to the last occurrence of a value >= the threshold of 6 in column "B"?
CodePudding user response:
To get the first occurrence, You can use idxmax
:
df.loc[df['B'].ge(6).idxmax()]
output:
A e
B 6
Name: 4, dtype: object
For just the value in 'A':
df.loc[df['B'].ge(6).idxmax(), 'A']
output: 'e'
For the last, do the same on the reversed Series:
df.loc[df.loc[::-1,'B'].ge(6).idxmax()]
output:
A k
B 8
Name: 10, dtype: object
df.loc[df.loc[::-1, 'B'].ge(6).idxmax(), 'A']
output: 'k'
CodePudding user response:
here is one way to do it
search for the rows meeting your criteria and then get the values from the bottom of the resultset
df.loc[df['B'] >=6][-1:]
in text dataframe
A B
8 i 8
in dataframe code
A B
10 k 8