Home > database >  Keep one value of a duplicate
Keep one value of a duplicate

Time:01-11

I have a pandas dataframe with possible duplicate values and would like to keep rows that have the value yes in the ans column

import pandas as pd
import numpy as np

data = {
'id': [1, 1, 2, 3, 4, 5, 5, 6, 7, 8, 8, 9, 9, 10],
'ans': ['no', 'yes', 'yes', 'no', 'no', 'yes', 'no', 'yes', 'no', 'no', 'yes', 'no', 'yes', 'no']
}

df = pd.DataFrame(data)
df.head(n = 8)

The expected output should be

data2 = {
'id': [1, 2, 3, 4, 5, 6, 7, 8, 9, 10],
'ans': ['yes', 'yes', 'no', 'no', 'yes', 'yes', 'no', 'yes', 'yes', 'no']
}

df2 = pd.DataFrame(data2)
df2.head(n = 10)

Thanks in advance!

CodePudding user response:

You could use either

df.query("ans=='yes'")

or

df.loc[df.ans == 'yes',:]

CodePudding user response:

IIUC use:

df = pd.DataFrame(data)
df = df[df['id'].isin(df.loc[df['ans'].eq('yes'), 'id'])]
print (df)
    id  ans
0    1   no
1    1  yes
2    2  yes
5    5  yes
6    5   no
7    6  yes
9    8   no
10   8  yes
11   9   no
12   9  yes

Or:

df = pd.DataFrame(data)
df = df.loc[df['ans'].eq('yes').groupby(df['id']).idxmax()]
print (df)
    id  ans
1    1  yes
2    2  yes
3    3   no
4    4   no
5    5  yes
7    6  yes
8    7   no
10   8  yes
12   9  yes
13  10   no
  • Related