Home > Enterprise >  Pandas merge how to only keep the first match row?
Pandas merge how to only keep the first match row?

Time:04-13

I know there are some similar questions, but few of them received correct answers and mine is different.

I have 2 dataframes,you can have them by running this code:

import pandas as pd
from io import StringIO

df1s = """
    contract  name   type    
    A8        S       ILC               
    A9        S       ILC               
"""
df1 = pd.read_csv(StringIO(df1s.strip()), sep='\s ')

df2s = """
     name   type              Basis 
     S       ILC              PO193            
     S       ILC              PO202            
"""
df2 = pd.read_csv(StringIO(df2s.strip()), sep='\s ')

Then I merge them:

df_ia = df1.merge(df2, on=['name', 'type'], how='left')

df_ia

Output:

   contract name    type    Basis
0   A8      S       ILC     PO193
1   A8      S       ILC     PO202
2   A9      S       ILC     PO193
3   A9      S       ILC     PO202

How can I only get the first matched row, the output should be:

   contract name    type    Basis
0   A8      S       ILC     PO193
1   A9      S       ILC     PO193

CodePudding user response:

drop_duplicates before merging:

>>> df1.merge(df2.drop_duplicates(["name", "type"]), how="left")
  contract name type  Basis
0       A8    S  ILC  PO193
1       A9    S  ILC  PO193
  • Related