Home > other >  Filter a data frame by row values but with tolerance
Filter a data frame by row values but with tolerance

Time:12-16

df <- data.frame(x = c(6.00001, 6.00000, 5.99999, 5, 2), y = c(1, 2, 3, 4, 5))

        x y
1 6.00001 1
2 6.00000 2
3 5.99999 3
4 5.00000 4
5 2.00000 5

I can use df[df$x == 6,] to quickly return the rows that has x == 6,

  x y
2 6 2

but what if I want to have a tolerance here? all.equal seems not applicable here:

df[all.equal(df$x, 6, 0.0001), ]
    x  y
NA NA NA

If I want to find the rows that x are very close to 6, is there a short way to do it? Expected output:

        x y
1 6.00001 1
2 6.00000 2
3 5.99999 3

CodePudding user response:

You can use near(), which is a wrapper for abs(x - y) < tol:

library(dplyr)

df %>%
  filter(near(x, 6, tol = 1e-04))

        x y
1 6.00001 1
2 6.00000 2
3 5.99999 3

CodePudding user response:

Using round:

df[ round(df$x, 4) == 6, ]
#         x y
# 1 6.00001 1
# 2 6.00000 2
# 3 5.99999 3

CodePudding user response:

The reason that df[all.equal(df$x, 6, 0.0001), ] produces NA output is twofold.

Firstly, all.equal() compares the entire object, rather than recycling the shorter vector and doing element-wise comparison.

Check out this example:

all.equal(
    target = c(1, 1), 
    current = 1,
    tolerance = 1e-7
)
# [1] "Numeric: lengths (2, 1) differ"

Secondly, it is the output it produces. As you can see, the output is a character vector. You cannot subset a data frame with this.

The reason for this is in the docs, which say that the return value is:

Either TRUE (NULL for attr.all.equal) or a vector of mode "character" describing the differences between target and current.

You can also see this in the all.equal.numeric() source. The logic is that it tries to build a message (msg) explaining the difference between current and target. If such a message exists it returns it, otherwise it returns TRUE:

if(is.null(msg)) TRUE else msg

So if you want to use all.equal() (and I can see why you would) you could use sapply() to do element-wise comparison, test whether you get a logical value and subset on that basis:

target  <- 6
tol  <- 1e-3
df[
    sapply(
        df$x, 
        \(current) is.logical(all.equal(target, current, tol))
        ),
]

#         x y
# 1 6.00001 1
# 2 6.00000 2
# 3 5.99999 3
  •  Tags:  
  • r
  • Related