Home > Net >  Keep unique values with only 1 instance
Keep unique values with only 1 instance

Time:11-10

I have the following dataset:

    Col_A   Amounts
0   A      100
1   B      200
2   C      500
3   D      100
4   E      500
5   F      300

The output I am trying to achieve is to basically remove all values based on the "Amounts" column which have a duplicate value and to keep only the rows where there is one unique instance of a value.

Desired Output:

    Col_A   Amounts
1   B      200
5   F      300

I have tried to use the following with no luck:

df_1.drop_duplicates(subset=['Amounts'])

This removes the duplicates, however, it still keeps the values which have occurred more than once.

Using the pandas .unique function also provides a similiar undesired output.

CodePudding user response:

You are close, need keep=False for remove all duplicates per Amounts column:

print (df.drop_duplicates(subset=['Amounts'], keep=False))
  Col_A  Amounts
1     B      200
5     F      300

CodePudding user response:

Less straight forward than the previous answer, but if you want to be able keep the rows that appear n times, you could use value_counts() as a mask, and keep only the rows that appear exactly / at least / less than n times:

import pandas as pd

data = { 'Col_1': ['A','B','C','D','E','F'],
        'Amounts': [100, 200, 500, 100, 500,300]
        }
df = pd.DataFrame(data)

n=1

mask = df.Amounts.value_counts()
df[df.Amounts.isin(mask.index[mask.lt(n 1)])]

outputs:

  Col_1  Amounts
1     B      200
5     F      300
  • Related