DF csv
This CSV and i am using it as Dataframe
colA,colB,colC
ABC,3,token
ABC,50,added
ABC,-50,deleted
xyz,20,token
pqr,50,added
pqr,-50,deleted
lmn,50,added
output
colA,colB,colC
ABC,3,token
xyz,20,token
lmn,50,added
CodePudding user response:
Methods based on abs
would incorrectly remove two positive or two negative values.
I suggest to perform a self-merge using the opposite of colB:
# get indices that have a matching positive/negative
idx = (df.reset_index()
.merge(df, left_on=['colA', 'colB'], right_on=['colA', -df['colB']],
how='inner')['index']
)
# [1, 2, 4, 5] (as list)
# drop them
out = df.drop(idx)
output:
colA colB colC
0 ABC 3 token
3 xyz 20 token
6 lmn 50 added
CodePudding user response:
You want to identify all the 'positive and negative' by taking the absolute value. Then drop duplicates based only on colB
(ignoring the text in colC
). Don't keep any (keep=False
) to make sure both are dropped.
df.abs().drop_duplicates(subset='colB', keep=False)
A small advice, given this is your first post: Please always try to put complete code examples, and make sure you ask a clear question. That makes it easier for other people to understand your problem and reply. Also, if this answers your question, don't forget to "accept" this answer.
CodePudding user response:
not sure exactly what you mean, this gets the desired output though:
df['abs_colB'] = np.abs(df['colB'])
df.drop_duplicates('abs_colB', keep='last').iloc[:, :-1].reset_index(drop=True)
colA | colB | colC | |
---|---|---|---|
0 | ABC | 3 | token |
1 | xyz | 20 | token |
2 | lmn | 50 | added |