I have a df,you can get it by copy this code:
from io import StringIO
df = """
RateCompany gs RB ValIssueDate
115 T G 54 19580101
116 T G 54 19870101
336 T S 54 19580101
337 T S 54 19870101
338 T j 53 19970101
"""
df = pd.read_csv(StringIO(df.strip()), sep='\s ')
Now I want drop all the rows that has same RateCompany,gs,RB ,but ValIssueDate is larger than other row.
The output should be:
RateCompany gs RB ValIssueDate
115 T G 54 19580101
336 T S 54 19580101
338 T j 53 19970101
Any friend can help?
CodePudding user response:
Sort by the columns you want to compare, then drop_duplicates on the key columns:
df.sort_values('ValIssueDate').drop_duplicates(['RateCompany', 'gs', 'RB'])
Output:
RateCompany gs RB ValIssueDate
115 T G 54 19580101
336 T S 54 19580101
338 T j 53 19970101
CodePudding user response:
Here is another way using idxmin()
df.loc[df.groupby(['RateCompany', 'gs', 'RB'])['ValIssueDate'].idxmin()]
RateCompany gs RB ValIssueDate
115 T G 54 19580101
336 T S 54 19580101
338 T j 53 19970101
CodePudding user response:
You could use groupby
with min
:
>>> df.groupby(["RateCompany", "gs", "RB"], as_index=False)["ValIssueDate"].min()
RateCompany gs RB ValIssueDate
0 T G 54 19580101
1 T S 54 19580101
2 T j 53 19970101