remove both duplicate rows from DataFrame with negative and positive values pands-CodePudding

DF csv

This CSV and i am using it as Dataframe

colA,colB,colC
ABC,3,token
ABC,50,added    
ABC,-50,deleted
xyz,20,token
pqr,50,added    
pqr,-50,deleted
lmn,50,added

output

colA,colB,colC
ABC,3,token
xyz,20,token
lmn,50,added

CodePudding user response：

Methods based on abs would incorrectly remove two positive or two negative values.

I suggest to perform a self-merge using the opposite of colB:

# get indices that have a matching positive/negative
idx = (df.reset_index()
         .merge(df, left_on=['colA', 'colB'], right_on=['colA', -df['colB']],
                how='inner')['index']
       )
# [1, 2, 4, 5] (as list)


# drop them
out = df.drop(idx)

output:

  colA  colB   colC
0  ABC     3  token
3  xyz    20  token
6  lmn    50  added

CodePudding user response：

You want to identify all the 'positive and negative' by taking the absolute value. Then drop duplicates based only on colB (ignoring the text in colC). Don't keep any (keep=False) to make sure both are dropped.

df.abs().drop_duplicates(subset='colB', keep=False)

A small advice, given this is your first post: Please always try to put complete code examples, and make sure you ask a clear question. That makes it easier for other people to understand your problem and reply. Also, if this answers your question, don't forget to "accept" this answer.

CodePudding user response：

not sure exactly what you mean, this gets the desired output though:

df['abs_colB'] = np.abs(df['colB'])
df.drop_duplicates('abs_colB', keep='last').iloc[:, :-1].reset_index(drop=True)

	colA	colB	colC
0	ABC	3	token
1	xyz	20	token
2	lmn	50	added