I am working on pandas manipulation and want to select only the last two rows for each column "B".
How to do without reset_index and filter (do inside groupby)
import pandas as pd
df = pd.DataFrame({
'A': list('aaabbbbcccc'),
'B': [0,1,2,5,7,2,1,4,1,0,2],
'V': range(10,120,10)
})
df
My attempt
df.groupby(['A','B'])['V'].sum()
Required output
A B
a
1 20
2 30
b
5 40
7 50
c
2 110
4 80
CodePudding user response:
IIUC, you want to get the rows the highest two B per A.
You can compute a descending rank
per group and keep those ≤ 2.
df[df.groupby('A')['B'].rank('first', ascending=False).le(2)]
Output:
A B V
1 a 1 20
2 a 2 30
3 b 5 40
4 b 7 50
7 c 4 80
10 c 2 110
CodePudding user response:
Try:
df.sort_values(['A', 'B']).groupby(['A']).tail(2)
Output:
A B V
1 a 1 20
2 a 2 30
3 b 5 40
4 b 7 50
10 c 2 110
7 c 4 80