I have a df, where I'm trying to compare 2 columns, and if they have around the same value in the same row, I want it to be dropped from the df. i.e.:
A B
1 3.21 3.15
2 6.98 2.07
3 5.41 8.95
4 0.32 0.30
I would want only rows 2/3 to remain in the df, because in rows 1/4 A and B are similar to each other.
I've tried to do something like if i
in column A is within a range ( /- 15% of the value of row B) remove that row, but it didn't work. Didn't know if there was some sort of built in function that pandas had for that.
CodePudding user response:
You could do this by passing rtol
parameter to numpy.isclose
:
result = df[~np.isclose(df.A, df.B, atol=0, rtol=0.15)]
# A B
# 2 6.98 2.07
# 3 5.41 8.95
CodePudding user response:
You could define your lower and upper bounds on permissable values
lower = df["A"]*0.85
upper = df["A"]*1.15
and then filter using pandas.Series.between
df[~df["B"].between(lower, upper)]