Home > Net >  Remove row when the row already exist by two columns
Remove row when the row already exist by two columns

Time:05-24

I have a problem. I want to remove all rows where customerId and fromDate have the same value. For example. The row 1 and 4 are the same. So row 4 should be removed. But how could I find the row what is the same?

Dataframe

   customerId    fromDate
0           1  2021-02-22
1           1  2021-03-18
2           1  2021-03-22
3           1        None
4           1  2021-03-18
5           3  2021-02-22
6           3  2021-02-22

Code

import pandas as pd


d = {'customerId': [1, 1, 1, 1, 1, 3, 3],
     'fromDate': ['2021-02-22', '2021-03-18', '2021-03-22', None, '2021-03-18', '2021-02-22', '2021-02-22']
    }
df = pd.DataFrame(data=d)
print(df)

What I want

   customerId    fromDate
0           1  2021-02-22
1           1  2021-03-18
2           1  2021-03-22
3           1        None
5           3  2021-02-22

# Removed
# 4           1  2021-03-18
# 6           3  2021-02-22

CodePudding user response:

IIUC You can use drop_duplicates to remove duplicates

df.drop_duplicates(inplace = True)

CodePudding user response:

You can use :

df.drop_duplicates()

Which drop all the duplicate rows

  • Related