Home > Software engineering >  drop all duplicate values in python
drop all duplicate values in python

Time:05-18

Assume I have the following dataframe in Python:

A = [['A',2,3],['A',5,4],['B',8,9],['C',8,10],['C',9,20],['C',10,20]]
B = pd.DataFrame(A, columns = ['Col1','Col2','Col3'])

enter image description here

This gives me the above dataframe: I want to remove the rows that have the same values for Col1 but different values for Col3. I have tried to use drop_duplicates command with different subset of columns but it does not give what I want. I can write for loop but that is not efficient at all (since you might have much more columns than this).

C= B.drop_duplicates(['Col1','Col3'],keep = False)

enter image description here

Can anyone help if there is any command in Python can do this without using for loop?

CodePudding user response:

This can do the job,

grouped_df = df.groupby(["Col1"])
groups = []
for group in df["Col1"].unique():
  group_df = grouped_df.get_group(group)
  groups.append(group_df.loc[group_df["Col3"].drop_duplicates(keep = False).index])
new_df = pd.concat(groups)

Output -

Col1 Col2 Col3
0 A 2 3
1 A 5 4
2 B 8 9
3 C 8 10

CodePudding user response:

Use:

A = [['A',2,3],['A',5,4],['B',8,9],['C',8,10],['C',9,20],['C',10,20]]
B = pd.DataFrame(A, columns = ['Col1','Col2','Col3'])

#Solution
B[B.groupby('Col1')['Col3'].transform(lambda x: x.drop_duplicates(keep=False))>0]

Output:

  Col1  Col2  Col3
0    A     2     3
1    A     5     4
2    B     8     9
3    C     8    10

Also, you can use:

B[B.groupby('Col1')['Col3'].transform(lambda x: x.drop_duplicates(keep=False)).notna()]

CodePudding user response:

A = [['A',2,3],['A',5,4],['B',8,9],['C',8,10],['C',9,20],['C',10,20]]
df = pd.DataFrame(A, columns = ['Col1','Col2','Col3'])

output = df[~df[['Col1', 'Col3']].duplicated(keep=False)]
print(output)

Output:

  Col1  Col2  Col3
0    A     2     3
1    A     5     4
2    B     8     9
3    C     8    10
  • Related