I am trying to find the best way to get the value of column Y which falls in the range of column Z and return the corresponding row.
For instance, consider A in column X, the value in Y is 25 so it falls between 20 and 30 so the corresponding value in W is 2.
## example dataframe
df <- data.frame(X=c("A","A","A","A","B","B","B","C","C","C"),
Y=c(25,25,25,25,35,35,35,15,15,15),
Z=c(10,20,30,40,10,20,30,10,20,30),
w=c(1,2,3,4,1,2,3,1,2,3))
#desired dataframe
df2 <- data.frame(X=c("A","B","C"),
Y=c(25,35,15),
Z=c(20,30,10),
w=c(2,3,1))
Appreciate any help
CodePudding user response:
There are couple of ways to do this (depending on the logic). One way is to group by 'X', get the row with the min
imum abs
olute difference between 'Z' and 'Y'. Note that for 'B' elements in 'X', the row selected is the last row where we don't know if the Y will be in range or not
library(dplyr)
df %>%
group_by(X) %>%
slice(which.min(abs(Z - Y))) %>%
ungroup
-output
# A tibble: 3 × 4
X Y Z w
<chr> <dbl> <dbl> <dbl>
1 A 25 20 2
2 B 35 30 3
3 C 15 10 1
Or another option is to create a lead
column after grouping by 'X', create a logical vector in filter
and use if/else
to return cases where there is no upper range i.e. for 'B'
df %>%
group_by(X) %>%
mutate(Z1 = lead(Z, default = last(Z))) %>%
filter({tmp <- Y > Z & Y <Z1
if(any(tmp)) tmp else row_number() == n()
}) %>%
ungroup %>%
select(-Z1)
CodePudding user response:
Perhaps we can try the data.table
like below
> setDT(df)[, .SD[findInterval(Y, Z) == seq_along(w)], X]
X Y Z w
1: A 25 20 2
2: B 35 30 3
3: C 15 10 1