I am trying to replace all values with NAs in a vector (by group) if there
are at least two values that are greater than 4 where x
is between 2 and
3.
In this example, in group a
, there are 2 values greater than 4 for 2 <= x <= 3.
library(dplyr)
#>
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#>
#> filter, lag
#> The following objects are masked from 'package:base':
#>
#> intersect, setdiff, setequal, union
tibble(
grp = c("a", "a", "a", "b", "b", "b"),
x = c(1, 2, 3, 1, 2, 3),
val = c(4, 5, 6, 1, 2, 1)
) %>%
group_by(grp) %>%
mutate(val2 = ifelse(sum(val[between(x, 2, 3)] > 4) >= 2, NA, val))
#> # A tibble: 6 × 4
#> # Groups: grp [2]
#> grp x val val2
#> <chr> <dbl> <dbl> <dbl>
#> 1 a 1 4 NA
#> 2 a 2 5 NA
#> 3 a 3 6 NA
#> 4 b 1 1 1
#> 5 b 2 2 1
#> 6 b 3 1 1
Expected output
tibble(
grp = c("a", "a", "a", "b", "b", "b"),
x = c(1, 2, 3, 1, 2, 3),
val = c(4, 5, 6, 1, 2, 1),
val2 = c(NA, NA, NA, 1, 2, 1)
)
#> # A tibble: 6 × 4
#> grp x val val2
#> <chr> <dbl> <dbl> <dbl>
#> 1 a 1 4 NA
#> 2 a 2 5 NA
#> 3 a 3 6 NA
#> 4 b 1 1 1
#> 5 b 2 2 2
#> 6 b 3 1 1
Created on 2021-10-25 by the reprex package (v2.0.1)
CodePudding user response:
The problem is that ifelse
return a vector with length equal to the first parameter. Since sum(val[between(x, 2, 3)] > 4) >= 2
returns a logical vector of length 1, only the first val
is returned and then it's recycled to the full length. For example ifelse(TRUE, 1:3, 11:13)
will only return 1
. You could either use rep
to repeat that value for the full length
mutate(val2 = ifelse(rep(sum(val[between(x, 2, 3)] > 4) >= 2, n()), NA, val))
or use a standard if/else statement
mutate(val2 = if(sum(val[between(x, 2, 3)] > 4) >= 2) NA else val)