Home > database >  Find rows in DataFrame that have duplicates in two columns
Find rows in DataFrame that have duplicates in two columns

Time:11-19

The DataFrame is given as follows

df = pd.DataFrame({'col1' : [1,2,9,2,9,6], 'col2' : [13,4,5,4,5,0], 'col3' : [8,23,5,4,9,5]})
   col1 col2 col3
0   1   13   8
1   2   4    23
2   9   5    5
3   2   4    4
4   9   5    9
5   6   0    5

How can I filter this DataFrame, so that I get only the rows that have the duplicates in both col1 and col2. So eventually the DataFrame should look like this:

df_new
   col1 col2 col3
0   2    4    23
1   2    4    5
2   9    5    4
3   9    5    9

CodePudding user response:

Use pd.DataFrame.duplicated()

df_new = df[df.duplicated(subset=["col1", "col2"], keep=False)]
  • Related