Home > Software engineering >  Split with pandas dataframe column duplicate values into two dataframes, one with duplicate, one wit
Split with pandas dataframe column duplicate values into two dataframes, one with duplicate, one wit

Time:04-05

eg.

INPUT: one dataframe

   Name     id     Price
   Apple     01       13.86
   Cherry    02       13.24
   Banana    02       1.99
   Peach     03       14.76
   Orange    04       2.48

OUTPUT: two dataframes

one with with duplicate dataframe[id]:

   Name     id     Price
   Cherry    02       13.24
   Banana    02       1.99

other without duplicate dataframe[id]:

   Name     id     Price
   Apple     01       13.86
   Peach     03       14.76
   Orange    04       2.48

Many thanks

CodePudding user response:

INPUT: df; OUTPUT: df_duplicated, df_unique

df_duplicated = df[df['id'].duplicated(keep=False)]
df_unique = pd.concat([df, df_duplicated]).drop_duplicates(keep=False)

print(df_duplicated)
print(df_unique)

CodePudding user response:

noDuplicate = data.drop_duplicates('id', keep=False)
print("No Duplicates:", noDuplicate)

duplicate = data[data['id'].duplicated(keep=False)]
print("Duplicates:", duplicate)

CodePudding user response:

You can count the occurrence of each unique identifier and then merge the result on your dataframe to get the unique and duplicate values.

As an example:

df = pd.DataFrame(data={'Id': [1, 2, 2, 3, 4]})
agg_df = df.groupby(by='Id').agg(count=('Id', 'count'))
agg_df.reset_index(inplace=True)
filtered_df = agg_df.loc[agg_df['count'] == 1].merge(df, on=['Id'])
unique_df = agg_df.loc[agg_df['count'] > 1].merge(df, on=['Id'])
  • Related