Home > Back-end >  Function that compares one column values against all other column values and returns matching one in
Function that compares one column values against all other column values and returns matching one in

Time:11-17

So let's say I have two data frames

df1 <- data.frame(n = rep(n = 2,c(0,1,2,3,4)), nn =c(rep(x = 1, 5), rep(x=2, 5)),
                  y = rnorm(10), z = rnorm(10)) 

df2 <- data.frame(x = rnorm(20))

Here is the first df:

 > head(df1)
      n nn          y           z
    1 0  1  1.5683647  0.48934096
    2 1  1  1.2967556 -0.77891030
    3 2  1 -0.2375963  1.74355935
    4 3  1 -1.2241501 -0.07838729
    5 4  1 -0.3278127 -0.97555379
    6 0  2 -2.4124503  0.07065982

Here is the second df:

         x
1 -0.4884289
2  0.9362939
3 -1.0624084
4 -0.9838209
5  0.4242479
6 -0.4513135

I'd like to substact x column values of df2 from z column values of df1. And return the rows of both dataframes for which the substracted value is approximately equal to that of y value of df1. Is there a way to construct such function, so that I could imply the approximation to which the values should be equal? So, that it's clear, I'd like to substract all x values from all z values and then compare the value to y column value of df1, and check if there is approximately matching value to y.

CodePudding user response:

Here's an approach where I match every row of df1 with every row of df2, then take x and y from z (as implied by your logic of comparing z-x to y; this is the same as comparing z-x-y to zero). Finally, I look at each row of df1 and keep the match with the lowest absolute difference.

library(dplyr)
left_join(
    df1 %>% mutate(dummy = 1, row = row_number()),
    df2 %>% mutate(dummy = 1, row = row_number()), by = "dummy") %>%
    mutate(diff = z - x - y) %>%
    group_by(row.x) %>%
    slice_min(abs(diff)) %>%
    ungroup()

Result (I used set.seed(42) before generating df1 df2.)

# A tibble: 10 x 9
       n    nn       y      z dummy row.x       x row.y    diff
   <dbl> <dbl>   <dbl>  <dbl> <dbl> <int>   <dbl> <int>   <dbl>
 1     0     1  1.37    1.30      1     1  0.0361    20 -0.102 
 2     1     1 -0.565   2.29      1     2  1.90       5  0.956 
 3     2     1  0.363  -1.39      1     3 -1.76       8  0.0112
 4     3     1  0.633  -0.279     1     4 -0.851     18 -0.0607
 5     4     1  0.404  -0.133     1     5 -0.609     14  0.0713
 6     0     2 -0.106   0.636     1     6  0.705     12  0.0372
 7     1     2  1.51   -0.284     1     7 -1.78       2 -0.0145
 8     2     2 -0.0947 -2.66      1     8 -2.41      19 -0.148 
 9     3     2  2.02   -2.44      1     9 -2.41      19 -2.04  
10     4     2 -0.0627  1.32      1    10  1.21       4  0.168 
  •  Tags:  
  • r
  • Related