I have a pandas dataframe in which I want to find the number of unique values in column Title
and drop all rows with duplicates (keep=False
logic), but only if a sencond column Format
is NOT the same. Other columns (Publisher
, Year
) should be disregarded.
original df:
Title Format Publisher Year
T1 F1 P1 2010
T1 F1 P2 2014
T2 F2 P1 2012
T3 F1 P3 2016
T4 F3 P2 2009
T4 F1 P3 2010
T4 F2 P3 2011
Desired filtered df:
Title Format Publisher Year
T1 F1 P1 2010
T1 F1 P2 2014
T2 F2 P1 2012
T3 F1 P3 2016
And then I would just use df["Title"].nunique()
to get 3
. I need both the filtered df and the final number for further analysis.
Thanks!
CodePudding user response:
Use:
df[df.groupby('Title')["Format"].transform('nunique').eq(1)]
CodePudding user response:
You can use pd.drop_duplicates
and keep the first one
df.drop_duplicates(subset=['Title', 'Format'])