Home > Net >  pandas how to drop rows base on different columns and different conditions
pandas how to drop rows base on different columns and different conditions

Time:11-05

I have a df,you can get it by copy this code:

from io import StringIO
df = """
      RateCompany   gs  RB  ValIssueDate
 115    T           G   54  19580101
 116    T           G   54  19870101
 336    T           S   54  19580101
 337    T           S   54  19870101
 338    T           j   53  19970101
"""
df = pd.read_csv(StringIO(df.strip()), sep='\s ')

Now I want drop all the rows that has same RateCompany,gs,RB ,but ValIssueDate is larger than other row.

The output should be:

 RateCompany    gs  RB  ValIssueDate
         115    T           G   54  19580101
         336    T           S   54  19580101
         338    T           j   53  19970101

Any friend can help?

CodePudding user response:

Sort by the columns you want to compare, then drop_duplicates on the key columns:

df.sort_values('ValIssueDate').drop_duplicates(['RateCompany', 'gs', 'RB'])

Output:

    RateCompany gs  RB  ValIssueDate
115           T  G  54      19580101
336           T  S  54      19580101
338           T  j  53      19970101

CodePudding user response:

Here is another way using idxmin()

df.loc[df.groupby(['RateCompany', 'gs', 'RB'])['ValIssueDate'].idxmin()]


     RateCompany   gs   RB  ValIssueDate
115            T    G   54  19580101
336            T    S   54  19580101
338            T    j   53  19970101

CodePudding user response:

You could use groupby with min:

>>> df.groupby(["RateCompany", "gs", "RB"], as_index=False)["ValIssueDate"].min()

  RateCompany gs  RB  ValIssueDate
0           T  G  54      19580101
1           T  S  54      19580101
2           T  j  53      19970101
  • Related