I have a problem. I want to remove all rows where customerId
and fromDate
have the same value. For example. The row 1
and 4
are the same. So row 4
should be removed. But how could I find the row what is the same?
Dataframe
customerId fromDate
0 1 2021-02-22
1 1 2021-03-18
2 1 2021-03-22
3 1 None
4 1 2021-03-18
5 3 2021-02-22
6 3 2021-02-22
Code
import pandas as pd
d = {'customerId': [1, 1, 1, 1, 1, 3, 3],
'fromDate': ['2021-02-22', '2021-03-18', '2021-03-22', None, '2021-03-18', '2021-02-22', '2021-02-22']
}
df = pd.DataFrame(data=d)
print(df)
What I want
customerId fromDate
0 1 2021-02-22
1 1 2021-03-18
2 1 2021-03-22
3 1 None
5 3 2021-02-22
# Removed
# 4 1 2021-03-18
# 6 3 2021-02-22
CodePudding user response:
IIUC You can use drop_duplicates to remove duplicates
df.drop_duplicates(inplace = True)
CodePudding user response:
You can use :
df.drop_duplicates()
Which drop all the duplicate rows