I would like to create a variable x
that is the product of variable e
and one of the variables a, b, c
, or d
. The resulting variable should take the value a * e
, but if a
is NA
, then it should take the value b * e
and if a & b
are missing, then it should take the value c * e
and so on.
For example: If my data frame looks like this:
df <- data.frame(a = c(1, 2, NA, NA, 5), b = c(NA, 1, NA, 4, 6), c = c(NA, 3, 3, 3, 7), d = c(1, 1, 1, 1, 1), e = c(1, 2, 3, 4, NA))
I would like to get to having the following result:
df$x <- c(1, 4, 9, 16, NA)
which I am trying to achieve the following way:
df <- df %>% mutate(x = case_when(!is.na(a) ~ a * e, is.na(a) ~ b * e, is.na(a) & is.na(b) ~ c * e, is.na(a) & is.na(b) & is.na(c) ~ d * e))
This, unfortunately is not working yet, since R is somehow not understanding is.na(a) & is.na(b)
as both values missing at the same time.
CodePudding user response:
You can use coalesce()
from dplyr
to find the first non-missing element.
library(dplyr)
df %>%
mutate(x = e * coalesce(a, b, c, d))
# a b c d e x
# 1 1 NA NA 1 1 1
# 2 2 1 3 1 2 4
# 3 NA NA 3 1 3 9
# 4 NA 4 3 1 4 16
# 5 5 6 7 1 NA NA
If you have lots of columns to be multiplied, you can automate it by using tidy-selection in across()
. (Edit: Thank @akrun for the improvement)
df %>%
mutate(x = e * do.call(coalesce, across(a:d)))
CodePudding user response:
df <-
data.frame(
a = c(1, 2, NA, NA, 5),
b = c(NA, 1, NA, 4, 6),
c = c(NA, 3, 3, 3, 7),
d = c(1, 1, 1, 1, 1),
e = c(1, 2, 3, 4, NA)
)
df$res <- df[, 5] * apply(df[-5], 1, function(x) x[which(x = !is.na(x))[1]])
df
#> a b c d e res
#> 1 1 NA NA 1 1 1
#> 2 2 1 3 1 2 4
#> 3 NA NA 3 1 3 9
#> 4 NA 4 3 1 4 16
#> 5 5 6 7 1 NA NA
Created on 2022-05-19 by the reprex package (v2.0.1)
CodePudding user response:
Here is one option with row/column
indexing
df$res <- df$e * df[-5][cbind(seq_len(nrow(df)),
max.col(!is.na(df[-5]), 'first'))]
df$res
[1] 1 4 9 16 NA