Home > database >  How to write for loop to mutate several columns using dplyr?
How to write for loop to mutate several columns using dplyr?

Time:03-17

I have a dataframe where I want to: (1) make a backup of the original column; (2) recode the column values equal to 1; and (3) replace NA values only for the specified columns and not the entire dataframe.

My full dataset has a dozen of columns formatted similarly.

q1_1 = c(1, 1, 1, NA, 0)
q1_2 = c(2, 2, 2, NA, 0)
df <- data.frame(q1_1, q1_2)

for (i in 1:2) {
  df <- df %>% 
    mutate(paste0("q1_", i, "_backup") = paste0("q1_", i),
           paste0("q1_new", i) = recode(paste0("q1_", i),
                           `i` = 1),
           paste0("q1_new", i) = replace_na(paste0("q1_", i), 0
           ))
}

I tried writing a for loop, but am getting an error message and don't understand how to diagnose the code.

> Error: unexpected '=' in: " df <- df %>% mutate(paste0("q1_", i, "backup") ="

> Error: Error: unexpected ',' in: " paste0("q1_new", i) = recode(paste0("q1_", i),  `i` = 1),"

> Error: unexpected ')' in: "           paste0("q1_new", i) = replace_na(paste0("q1_", i), 0 ))"

> Error: unexpected '}' in "}"

The result should look like this:

q1_1 = c(1, 1, 1, NA, 0)
q1_1_new = c(1, 1, 1, 0, 0)
q1_1_backup = c(1, 1, 1, NA, 0)

q1_2 = c(2, 2, 2, NA, 0)
q1_2_new = c(1, 1, 1, 0, 0)
q1_2_backup = c(2, 2, 2, NA, 0)

df <- data.frame(q1_1, q1_1_new, q1_1_backup, q1_2, q1_2_new, q1_2_backup)

CodePudding user response:

When you want to mutate several columns the same way, the answer is across(), not a loop. I'm having trouble matching your code/description with your desired output, so here's a small example that (almost) matches your desired output. The difference is that I kept the original data with the original column names and added _edited to the modified values - it's easier that way.

df %>%
  mutate(across(everything(), 
    ~ coalesce(as.integer(.x > 0), 0),
    .names = "{.col}_new"
  )) %>%
  mutate(across(!contains("new"), I, .names = "{.col}_backup"))
#   q1_1 q1_2 q1_1_new q1_2_new q1_1_backup q1_2_backup
# 1    1    2        1        1           1           2
# 2    1    2        1        1           1           2
# 3    1    2        1        1           1           2
# 4   NA   NA        0        0          NA          NA
# 5    0    0        0        0           0           0

You can see how the new names are defined with {.col} being the original column name.

The colwise vignette is a good read if you want to learn more about across().

  •  Tags:  
  • r
  • Related