I am trying to create a new dataframe that has the columns id and name, for all the duplicate ids in the dataframe.
My dataframes structure is:
id, name,lat, lon, price, minimum_nights, review_cnt
I tried the .duplicated
function, but I am not getting what I need. I think I might be using it wrong
CodePudding user response:
.duplicated() by default returns all duplicated features except the first feature. To get all duplicated features for 'id' and 'name' including the first occurrence:
df = df[['id', 'name']].copy()
df[df.duplicated(keep=False)]