Home > Back-end >  How keep latest records in dataframe according to last version using pandas
How keep latest records in dataframe according to last version using pandas

Time:11-23

I have a dataframe like this

Id     b_num    b_type     b_ver   price
100    55        A          0       100
101    55        A          0        50    
102    55        A          1       100
103    55        A          1        60 
104    30        C          2       100
105    30        C          2        50    
106    30        C          2       100
107    30        C          2        60
108    30        C          4       200
109    30        C          4        55    
110    30        C          4        80
111    30        C          4       120
112    30        C          4        20

I would like to keep the latest version of b_num and b_type, b_ver is the number of version

The output expected:

Id     b_num    b_type     b_ver   price
102    55        A          1       100
103    55        A          1        60 
108    30        C          4       200
109    30        C          4        55    
110    30        C          4        80
111    30        C          4       120
112    30        C          4        20

thanks

CodePudding user response:

Consider trying with:

df.merge(df.groupby(['b_num','b_type'],as_index=False)['b_ver'].last(),
         on=['b_num','b_type','b_ver'])

Outputting:

    Id  b_num b_type  b_ver  price
0  102     55      A      1    100
1  103     55      A      1     60
2  108     30      C      4    200
3  109     30      C      4     55
4  110     30      C      4     80
5  111     30      C      4    120
6  112     30      C      4     20

CodePudding user response:

Using groupby.rank()

df.loc[df.groupby(['b_num','b_type'])['b_ver'].rank(ascending=False,method = 'dense').eq(1)]

using groupby.nlargest()

df.loc[df.groupby(['b_num','b_type'])['b_ver'].nlargest(1,keep = 'all').droplevel([0,1]).index]

using groupby.transform()

df.loc[df.groupby(['b_num','b_type'])['b_ver'].transform('max').eq(df['b_ver'])]

Output:

     Id  b_num b_type  b_ver  price
2   102     55      A      1    100
3   103     55      A      1     60
8   108     30      C      4    200
9   109     30      C      4     55
10  110     30      C      4     80
11  111     30      C      4    120
12  112     30      C      4     20
  • Related