Home > Software design >  Match value in column within a range of another column on the same dataframe and return correspondin
Match value in column within a range of another column on the same dataframe and return correspondin

Time:09-29

I am trying to find the best way to get the value of column Y which falls in the range of column Z and return the corresponding row.

For instance, consider A in column X, the value in Y is 25 so it falls between 20 and 30 so the corresponding value in W is 2.

## example dataframe
df <- data.frame(X=c("A","A","A","A","B","B","B","C","C","C"),
                 Y=c(25,25,25,25,35,35,35,15,15,15),
                 Z=c(10,20,30,40,10,20,30,10,20,30),
                 w=c(1,2,3,4,1,2,3,1,2,3))

#desired dataframe
df2 <- data.frame(X=c("A","B","C"),
                  Y=c(25,35,15),
                  Z=c(20,30,10),
                  w=c(2,3,1)) 

Appreciate any help

CodePudding user response:

There are couple of ways to do this (depending on the logic). One way is to group by 'X', get the row with the minimum absolute difference between 'Z' and 'Y'. Note that for 'B' elements in 'X', the row selected is the last row where we don't know if the Y will be in range or not

library(dplyr)
df %>%
    group_by(X) %>%
    slice(which.min(abs(Z - Y))) %>%
    ungroup

-output

# A tibble: 3 × 4
  X         Y     Z     w
  <chr> <dbl> <dbl> <dbl>
1 A        25    20     2
2 B        35    30     3
3 C        15    10     1

Or another option is to create a lead column after grouping by 'X', create a logical vector in filter and use if/else to return cases where there is no upper range i.e. for 'B'

df %>% 
   group_by(X) %>%
   mutate(Z1 = lead(Z, default = last(Z))) %>% 
   filter({tmp <- Y > Z & Y <Z1
     if(any(tmp)) tmp else row_number() == n()
    }) %>%
    ungroup %>% 
    select(-Z1)

CodePudding user response:

Perhaps we can try the data.table like below

> setDT(df)[, .SD[findInterval(Y, Z) == seq_along(w)], X]
   X  Y  Z w
1: A 25 20 2
2: B 35 30 3
3: C 15 10 1
  • Related