I have a dataframe which is supposed to have a unique field. In the data I am given the field is not unique and so I have been using drop_duplicates to get rid of those. However, I would like to see what rows I am dropping for QC. I've been reading threads on this but I've only seen ones that look at entire duplicate rows (not just one field that is duplicated), or they compare dataframes that don't have duplicates within themselves. How can I get a dataframe of the rows that are removed in my code below? Thank you!
df= df.drop_duplicates(subset='_nefin_tree_obsID', keep=False)
CodePudding user response:
refer to documentation duplicated
this should help
df.duplicated(subset='_nefin_tree_obsID' )