Suppose I have a groupby object (grouped on Col1) like below:
Col1 Col2 Col3 Col4 Col5
----------------------------------------
AAA 001 456 846 239 row1
002 374 238 904 row2
003 456 846 239 row3
BBB 001 923 222 398 row1
002 923 222 398 row2
003 755 656 949 row3
CCC 001 324 454 565 row1
002 744 345 336 row2
003 567 355 756 row3
Is there a way to check for duplicate rows based on [Col3, Col4, Col5] within each group. In the example above, for group AAA, row 1 and row 3 match since Col3, Col4, Col5 values are same. Again in group BBB, row 1 and row 2 match. In group CCC, using above logic, we don't have any duplicate rows.
Can we create a list with 'n' boolean elements (where 'n' represents the number of groups), fill it with True if there is any duplicate for that group, else False. Therefore, for above example output would be:
[True, True, False]
CodePudding user response:
You can try groupby
column Col1
then use duplicated()
to check if there are any duplicated from Col3
to Col5
out = (df.groupby('Col1')
.apply(lambda g: g[['Col3','Col4','Col5']].duplicated().any())
.tolist())
print(out)
[True, True, False]