I have a dataframe like this
Id b_num b_type b_ver price
100 55 A 0 100
101 55 A 0 50
102 55 A 1 100
103 55 A 1 60
104 30 C 2 100
105 30 C 2 50
106 30 C 2 100
107 30 C 2 60
108 30 C 4 200
109 30 C 4 55
110 30 C 4 80
111 30 C 4 120
112 30 C 4 20
I would like to keep the latest version of b_num and b_type, b_ver is the number of version
The output expected:
Id b_num b_type b_ver price
102 55 A 1 100
103 55 A 1 60
108 30 C 4 200
109 30 C 4 55
110 30 C 4 80
111 30 C 4 120
112 30 C 4 20
thanks
CodePudding user response:
Consider trying with:
df.merge(df.groupby(['b_num','b_type'],as_index=False)['b_ver'].last(),
on=['b_num','b_type','b_ver'])
Outputting:
Id b_num b_type b_ver price
0 102 55 A 1 100
1 103 55 A 1 60
2 108 30 C 4 200
3 109 30 C 4 55
4 110 30 C 4 80
5 111 30 C 4 120
6 112 30 C 4 20
CodePudding user response:
Using groupby.rank()
df.loc[df.groupby(['b_num','b_type'])['b_ver'].rank(ascending=False,method = 'dense').eq(1)]
using groupby.nlargest()
df.loc[df.groupby(['b_num','b_type'])['b_ver'].nlargest(1,keep = 'all').droplevel([0,1]).index]
using groupby.transform()
df.loc[df.groupby(['b_num','b_type'])['b_ver'].transform('max').eq(df['b_ver'])]
Output:
Id b_num b_type b_ver price
2 102 55 A 1 100
3 103 55 A 1 60
8 108 30 C 4 200
9 109 30 C 4 55
10 110 30 C 4 80
11 111 30 C 4 120
12 112 30 C 4 20