I have a dataframe:
dat <- data.frame(X1 = c(0, NA, NA),
X2 = c(1, NA, NA),
X3 = c(1, NA, NA),
Y1 = c(1, NA, NA),
Y2 = c(NA, NA, NA),
Y3 = c(0, NA, NA))
I want to create a composite score for X and Y variables. This is what I have so far:
clean_dat <- dat %>% rowwise() %>% mutate(X = sum(c(X1, X2, X3), na.rm = T),
Y = sum(c(Y1, Y2, Y3), na.rm = T))
However, I want the composite score for the rows with all NA
s (i.e. rows 2 and 3) to be 0 in the column X
and Y
. Does anyone know how to do this?
Edit: I'd like to know how I can make X
and Y
in rows 2 and 3 NA
too.
Thanks so much!
CodePudding user response:
By default, sum
or rowSums
return 0 when we use na.rm = TRUE
and when all the elements are NA
. To prevent this either use an if/else
or case_when
approach i.e. determine whether there are any non-NA elements with if_any
, then take the rowSums
of the concerned columns within case_when
(by default the TRUE
will return NA
)
library(dplyr)
dat %>%
mutate(X = case_when(if_any(starts_with('X'), complete.cases)
~ rowSums(across(starts_with('X')), na.rm = TRUE)),
Y = case_when(if_any(starts_with('Y'), complete.cases) ~
rowSums(across(starts_with('Y')), na.rm = TRUE)) )
-output
X1 X2 X3 Y1 Y2 Y3 X Y
1 0 1 1 1 NA 0 2 1
2 NA NA NA NA NA NA NA NA
3 NA NA NA NA NA NA NA NA