Home > Back-end >  remove both duplicate rows from DataFrame with negative and positive values pands
remove both duplicate rows from DataFrame with negative and positive values pands

Time:07-13

DF csv

This CSV and i am using it as Dataframe

colA,colB,colC
ABC,3,token
ABC,50,added    
ABC,-50,deleted
xyz,20,token
pqr,50,added    
pqr,-50,deleted
lmn,50,added

output

colA,colB,colC
ABC,3,token
xyz,20,token
lmn,50,added

CodePudding user response:

Methods based on abs would incorrectly remove two positive or two negative values.

I suggest to perform a self-merge using the opposite of colB:

# get indices that have a matching positive/negative
idx = (df.reset_index()
         .merge(df, left_on=['colA', 'colB'], right_on=['colA', -df['colB']],
                how='inner')['index']
       )
# [1, 2, 4, 5] (as list)


# drop them
out = df.drop(idx)

output:

  colA  colB   colC
0  ABC     3  token
3  xyz    20  token
6  lmn    50  added

CodePudding user response:

You want to identify all the 'positive and negative' by taking the absolute value. Then drop duplicates based only on colB (ignoring the text in colC). Don't keep any (keep=False) to make sure both are dropped.

df.abs().drop_duplicates(subset='colB', keep=False)

A small advice, given this is your first post: Please always try to put complete code examples, and make sure you ask a clear question. That makes it easier for other people to understand your problem and reply. Also, if this answers your question, don't forget to "accept" this answer.

CodePudding user response:

not sure exactly what you mean, this gets the desired output though:

df['abs_colB'] = np.abs(df['colB'])
df.drop_duplicates('abs_colB', keep='last').iloc[:, :-1].reset_index(drop=True)
colA colB colC
0 ABC 3 token
1 xyz 20 token
2 lmn 50 added
  • Related