eg.
INPUT: one dataframe
Name id Price
Apple 01 13.86
Cherry 02 13.24
Banana 02 1.99
Peach 03 14.76
Orange 04 2.48
OUTPUT: two dataframes
one with with duplicate dataframe[id]:
Name id Price
Cherry 02 13.24
Banana 02 1.99
other without duplicate dataframe[id]:
Name id Price
Apple 01 13.86
Peach 03 14.76
Orange 04 2.48
Many thanks
CodePudding user response:
INPUT: df; OUTPUT: df_duplicated, df_unique
df_duplicated = df[df['id'].duplicated(keep=False)]
df_unique = pd.concat([df, df_duplicated]).drop_duplicates(keep=False)
print(df_duplicated)
print(df_unique)
CodePudding user response:
noDuplicate = data.drop_duplicates('id', keep=False)
print("No Duplicates:", noDuplicate)
duplicate = data[data['id'].duplicated(keep=False)]
print("Duplicates:", duplicate)
CodePudding user response:
You can count the occurrence of each unique identifier and then merge the result on your dataframe to get the unique and duplicate values.
As an example:
df = pd.DataFrame(data={'Id': [1, 2, 2, 3, 4]})
agg_df = df.groupby(by='Id').agg(count=('Id', 'count'))
agg_df.reset_index(inplace=True)
filtered_df = agg_df.loc[agg_df['count'] == 1].merge(df, on=['Id'])
unique_df = agg_df.loc[agg_df['count'] > 1].merge(df, on=['Id'])