Home > other >  Keep duplicated row from a specific data frame after append
Keep duplicated row from a specific data frame after append

Time:08-18

I have two data frames (df1 & df2), after a merge, I find two duplicate rows (same values for three columns:["ID","City","Year"]). I would like to keep one of the duplicate rows which comes from df2 .

import pandas as pd
data1 = {
      'ID':[7,2],
     'City': ["Berlin","Paris"],
    'Year':[2012,2000],
    'Number':[62,43],}

data2 ={
    'ID': [7, 5],
    'City': ["Berlin", "London"],
    'Year':[2012,2019],
    'Number': [60, 100], }

df1 = pd.DataFrame(data1)
df2 = pd.DataFrame(data2)

df_merged= df1.append(df2)

Is there a way to do this ?

Expected output:

    ID    City  Year  Number
0   2   Paris  2000      43
1   7  Berlin  2012      60
2   5  London  2019     100

CodePudding user response:

new = (pd.concat([df1, df2])
           .drop_duplicates(subset=["ID", "City", "Year"],
                            keep="last",
                            ignore_index=True))

append will be gone in near future, use pd.concat there please. Then drop_duplicates over the said columns while keep="last":

In [376]: df1
Out[376]:
   ID    City  Year  Number
0   7  Berlin  2012      62
1   2   Paris  2000      43

In [377]: df2
Out[377]:
   ID    City  Year  Number
0   7  Berlin  2012      60
1   5  London  2019     100

In [378]: (pd.concat([df1, df2])
     ...:     .drop_duplicates(subset=["ID", "City", "Year"],
     ...:                      keep="last",
     ...:                      ignore_index=True))
Out[378]:
   ID    City  Year  Number
0   2   Paris  2000      43
1   7  Berlin  2012      60
2   5  London  2019     100

ignore_index makes it again 0, 1, 2 after drop_duplicates disturbs it

  • Related