I have the following dataset:
Col_A Amounts
0 A 100
1 B 200
2 C 500
3 D 100
4 E 500
5 F 300
The output I am trying to achieve is to basically remove all values based on the "Amounts" column which have a duplicate value and to keep only the rows where there is one unique instance of a value.
Desired Output:
Col_A Amounts
1 B 200
5 F 300
I have tried to use the following with no luck:
df_1.drop_duplicates(subset=['Amounts'])
This removes the duplicates, however, it still keeps the values which have occurred more than once.
Using the pandas .unique
function also provides a similiar undesired output.
CodePudding user response:
You are close, need keep=False
for remove all duplicates per Amounts
column:
print (df.drop_duplicates(subset=['Amounts'], keep=False))
Col_A Amounts
1 B 200
5 F 300
CodePudding user response:
Less straight forward than the previous answer, but if you want to be able keep the rows that appear n times, you could use value_counts()
as a mask, and keep only the rows that appear exactly / at least / less than n times:
import pandas as pd
data = { 'Col_1': ['A','B','C','D','E','F'],
'Amounts': [100, 200, 500, 100, 500,300]
}
df = pd.DataFrame(data)
n=1
mask = df.Amounts.value_counts()
df[df.Amounts.isin(mask.index[mask.lt(n 1)])]
outputs:
Col_1 Amounts
1 B 200
5 F 300