Print the duplicate value-CodePudding

I was trying to check a files containing duplicate data using pandas.

Student,"Details"
Joe|"December 2017|chemistry"
Bob|"April 2018|chemistry|Biology"
sam|"December 2018|physics"

I want to check any duplicate value in second column(Details),If it has any duplicate value then print the line with all the duplicate value .So here it should be

Joe|"December 2017|chemistry"
Bob|"April 2018|chemistry|Biology"

CodePudding user response：

Split the Details columns by '|', explode, check if value is duplicated, groupby index and use max aggregation to create a boolean mask. Use this mask to filter.

mask = (df['Details'].str.split('|')
        .explode()
        .duplicated(keep=False)
        .groupby(level=0).max())

df[mask]

[out]

  Student                       Details
0     Joe       December 2017|chemistry
1     Bob  April 2018|chemistry|Biology

CodePudding user response：

the pandas duplicated function should identify your duplicates.

pandas.duplicated(subset='Details')