How do I drop duplicate rows only if value in second column is NOT equal?-CodePudding

I have a pandas dataframe in which I want to find the number of unique values in column Title and drop all rows with duplicates (keep=False logic), but only if a sencond column Format is NOT the same. Other columns (Publisher, Year) should be disregarded.

original df:

Title    Format    Publisher    Year
T1       F1        P1           2010
T1       F1        P2           2014
T2       F2        P1           2012
T3       F1        P3           2016
T4       F3        P2           2009
T4       F1        P3           2010
T4       F2        P3           2011

Desired filtered df:

Title    Format    Publisher    Year
T1       F1        P1           2010
T1       F1        P2           2014
T2       F2        P1           2012
T3       F1        P3           2016

And then I would just use df["Title"].nunique() to get 3. I need both the filtered df and the final number for further analysis.

Thanks!

CodePudding user response：

Use:

df[df.groupby('Title')["Format"].transform('nunique').eq(1)]

CodePudding user response：

You can use pd.drop_duplicates and keep the first one

df.drop_duplicates(subset=['Title', 'Format'])