How do I add a new column through a conditional mutate but preserve the original dataframe?-CodePudding

I have a large dataframe (df) containing 500 rows, 50 columns/variables but only want to target specific variables.

targ_vars <- c("all3a1", "3a1_arc",
              "all3b1", "3b1_arc",
              "all3c1", "3c1_arc")

The vector above contains the variables which have frequency data (i.e. multiple rows with 1,2,3 etc.)

I want to add a new count column in the original large dataframe (df) which contains the row sum of any non-NA value specifically for those select variables in "targ_vars".

Again, I'm not trying to add the value of the actual frequency data across each of those variables, but moreso just a sum of any non-NA value per row (i.e. 1,2,NA,7,NA,1 = total row count of 4 non-NA).

I've gotten as far as this:

df <- df %>%
        select(targ_vars) %>%
        mutate(targ_var_count = rowSums(!is.na(.), na.rm = TRUE))

The problem is I'm not sure how to "deselect" the variables I used to run the mutate calculation. The line above would result in overwriting the entire original dataframe (df) containing 50 columns/vars, and placing back only the selected 6 variables in (targ_vars) plus the new (targ_var_count) variable that mutate calculated.

Essentially, I just want to focus on that last mutate line, and plop that new count column back into the original (df).

I tried something like the one below but it ended up giving me a list when I call "df$targcount" instead of just 1 rowSum column:

df$targcount <- df %>%
    select(targ_vars) %>%
    mutate(targcount = rowSums(!is.na(.), na.rm = TRUE))

Any help/tips would be appreciated.

CodePudding user response：

You could use dplyr::across to get the count of non NA values for just your targ_vars columns.

Using some fake random example data:

set.seed(123)

dat <- data.frame(
  a = sample(c(0, NA), 10, replace = TRUE),
  b = sample(c(0, NA), 10, replace = TRUE),
  c = sample(c(0, NA), 10, replace = TRUE),
  d = sample(c(0, NA), 10, replace = TRUE)
)

targ_vars <- c("c", "d")

library(dplyr, w = FALSE)

dat %>%
  mutate(targcount = rowSums(across(all_of(targ_vars), ~ !is.na(.x))))
#>     a  b  c  d targcount
#> 1   0 NA  0  0         2
#> 2   0 NA NA NA         0
#> 3   0 NA  0  0         2
#> 4  NA  0  0 NA         1
#> 5   0 NA  0 NA         1
#> 6  NA  0  0  0         2
#> 7  NA NA NA  0         1
#> 8  NA  0 NA  0         1
#> 9   0  0  0  0         2
#> 10  0  0 NA NA         0