I want to apply a conditional rowMeans to each group of IDs using dplyr. The reproducible table follows:
dat <- as.Date("2021/08/04")
len <- 4
seq(dat, by = "day", length.out = len)
input <- data.frame(
date = c(seq(dat, by = "day", length.out = len) , seq(dat, by = "day", length.out = len)),
user_id = c(rep("aa",4),rep("bb",4)),
var1 = c(1:4),
var2 = c(4,7,23,9,0,0,0,0),
var3 = c(0,0,0,0,4,8,2,7)
)
The logic is: For each ID, first select the columns which are numeric and their sum is =!0, and then compute the rowMean of the selected columns for that ID.
The desired output table follows:
output = data.frame(
date = c(seq(dat, by = "day", length.out = len) , seq(dat, by = "day", length.out = len)),
user_id = c(rep("aa",4),rep("bb",4)),
var1 = c(1:4),
var2 = c(4,7,23,9,0,0,0,0),
var3 = c(0,0,0,0,4,8,2,7),
rowAverage = as.numeric(c(rowMeans(input[1:4,3:4]), rowMeans(input[5:8,c(3,5)])))
)
The lines that I came-up with follow, but I get some errors:
output = input %>%
dplyr::group_by(user_id) %>%
dplyr::mutate(rowAverage = rowMeans(select_if(function(x) {(is.numeric(x)) && (sum(x)=!0)})))
Could you please help me correct this? Thank you.
CodePudding user response:
You’re very close. You just need to supply the data to select_if()
, too.
Use cur_data()
for that.
input %>%
group_by(user_id) %>%
mutate(rowAverage = rowMeans(
select_if(cur_data(), \(x) is.numeric(x) && sum(x) != 0)
))
#> # A tibble: 8 × 6
#> # Groups: user_id [2]
#> date user_id var1 var2 var3 rowAverage
#> <date> <chr> <int> <dbl> <dbl> <dbl>
#> 1 2021-08-04 aa 1 4 0 2.5
#> 2 2021-08-05 aa 2 7 0 4.5
#> 3 2021-08-06 aa 3 23 0 13
#> 4 2021-08-07 aa 4 9 0 6.5
#> 5 2021-08-04 bb 1 0 4 2.5
#> 6 2021-08-05 bb 2 0 8 5
#> 7 2021-08-06 bb 3 0 2 2.5
#> 8 2021-08-07 bb 4 0 7 5.5
CodePudding user response:
Though the question is tagged dplyr
, dplyr
is not very good at processing by rows so here is a base R solution.
res <- by(input, input$user_id, \(X) {
i <- sapply(X, \(x) is.numeric(x) && sum(x, na.rm = TRUE) != 0)
X$rowAverage <- rowMeans(X[i])
X
})
res <- do.call(rbind, res)
res
#> date user_id var1 var2 var3 rowAverage
#> aa.1 2021-08-04 aa 1 4 0 2.5
#> aa.2 2021-08-05 aa 2 7 0 4.5
#> aa.3 2021-08-06 aa 3 23 0 13.0
#> aa.4 2021-08-07 aa 4 9 0 6.5
#> bb.5 2021-08-04 bb 1 0 4 2.5
#> bb.6 2021-08-05 bb 2 0 8 5.0
#> bb.7 2021-08-06 bb 3 0 2 2.5
#> bb.8 2021-08-07 bb 4 0 7 5.5
Created on 2022-10-21 with reprex v2.0.2