I have a large dataframe (df) containing 500 rows, 50 columns/variables but only want to target specific variables.
targ_vars <- c("all3a1", "3a1_arc",
"all3b1", "3b1_arc",
"all3c1", "3c1_arc")
The vector above contains the variables which have frequency data (i.e. multiple rows with 1,2,3 etc.)
I want to add a new count column in the original large dataframe (df) which contains the row sum of any non-NA value specifically for those select variables in "targ_vars".
Again, I'm not trying to add the value of the actual frequency data across each of those variables, but moreso just a sum of any non-NA value per row (i.e. 1,2,NA,7,NA,1 = total row count of 4 non-NA).
I've gotten as far as this:
df <- df %>%
select(targ_vars) %>%
mutate(targ_var_count = rowSums(!is.na(.), na.rm = TRUE))
The problem is I'm not sure how to "deselect" the variables I used to run the mutate calculation. The line above would result in overwriting the entire original dataframe (df) containing 50 columns/vars, and placing back only the selected 6 variables in (targ_vars) plus the new (targ_var_count) variable that mutate calculated.
Essentially, I just want to focus on that last mutate line, and plop that new count column back into the original (df).
I tried something like the one below but it ended up giving me a list when I call "df$targcount" instead of just 1 rowSum column:
df$targcount <- df %>%
select(targ_vars) %>%
mutate(targcount = rowSums(!is.na(.), na.rm = TRUE))
Any help/tips would be appreciated.
CodePudding user response:
You could use dplyr::across
to get the count of non NA
values for just your targ_vars
columns.
Using some fake random example data:
set.seed(123)
dat <- data.frame(
a = sample(c(0, NA), 10, replace = TRUE),
b = sample(c(0, NA), 10, replace = TRUE),
c = sample(c(0, NA), 10, replace = TRUE),
d = sample(c(0, NA), 10, replace = TRUE)
)
targ_vars <- c("c", "d")
library(dplyr, w = FALSE)
dat %>%
mutate(targcount = rowSums(across(all_of(targ_vars), ~ !is.na(.x))))
#> a b c d targcount
#> 1 0 NA 0 0 2
#> 2 0 NA NA NA 0
#> 3 0 NA 0 0 2
#> 4 NA 0 0 NA 1
#> 5 0 NA 0 NA 1
#> 6 NA 0 0 0 2
#> 7 NA NA NA 0 1
#> 8 NA 0 NA 0 1
#> 9 0 0 0 0 2
#> 10 0 0 NA NA 0