Home > OS >  Filtering values in one dataframe if they are within a specified range of values in another datafram
Filtering values in one dataframe if they are within a specified range of values in another datafram

Time:07-27

We have two dataframes df1 and df2.

df1 <- data.frame(x = c(10, 12, 52, 102))
df2 <- data.frame(y = c(11, 15, 30, 130))

We want to filter rows in df1$x if they are within plus or minus 10 values of any value in 'df2$y'

so the result of the filter on df1 would be :

data.frame(x = c(10, 12))

because 10 and 12 in df1$x are within /- 10 of 11 and 15 in df2$y.

If we wanted to only filter rows that matched "exactly", we could use dplyr::left_join. But how can we do this if we want to use a range of values (i.e., /- 10) as described above?

CodePudding user response:

We may use power_inner_join

library(powerjoin)
library(dplyr)
power_inner_join(df1, df2, by = c(~ .x$x <= (.y$y   10 ) & 
     .x$x >= (.y$y - 10))) %>% 
   distinct(x)
   x
1 10
2 12

Or do a non-equi join with data.table

library(data.table)
setDT(df1)[setDT(df2)[, .(ystart = y-10, yend = y 10)], 
    flag := TRUE, on = .(x >= ystart, x <= yend)]
df1[as.logical(flag), .(x)]
       x
   <num>
1:    10
2:    12

CodePudding user response:

library(sqldf)
#> Loading required package: gsubfn
#> Loading required package: proto
#> Loading required package: RSQLite
#> Warning: package 'RSQLite' was built under R version 4.1.2

df1 <- data.frame(x = c(10, 12, 52, 102))
df2 <- data.frame(y = c(11, 15, 30, 130))

sqldf('
select
  distinct
  a.*
from
  df1 a
  join df2 b
    on a.x between b.y - 10 and b.y   10
')
#>    x
#> 1 10
#> 2 12

Created on 2022-07-26 by the reprex package (v2.0.1)

CodePudding user response:

## need drop = FALSE because df1 has just one column
df1[abs(df1$x - df2$y) <= 10, , drop = FALSE]
#   x
#1 10
#2 12

## no need to consider "drop"
subset(df1, abs(df1$x - df2$y) <= 10)
  • Related