I have a data frame df
which looks something like what you see below.
df <- data.frame(id = c("abc", "def", "ghi", "jkl", "mno"),
flag_V1 = c(2, 1, 1, 0, 1),
V1 = c(600, 1000, 500, NA, 700),
flag_V2 = c(1, 1, 0, 2, 0),
V2 = c(400, 600, 100, 700, NA),
flag_V3 = c(1, 2, 1, 0, 1),
V3 = c(600, 300, 600, NA, 700))
> df
id flag_V1 V1 flag_V2 V2 flag_V3 V3
1 abc 2 600 1 400 1 600
2 def 1 1000 1 600 2 300
3 ghi 1 500 0 100 1 600
4 jkl 0 NA 2 700 0 NA
5 mno 1 700 0 NA 1 700
Essentially, each column V1
, V2
, and V3
comes with a flag variable flag_V1
, flag_V2
, and flag_V3
which contain information about how the values in V1
, V2
, and V3
where collected.
As you can see, whenever the flag variable reports 0
, the value in the associated column is NA
. I want to replace these missing values with zeroes if (and only if) the flag variable reports 0
. I want something like this
df_modified <- df %>% mutate(V1 = ifelse(flag_V1 == 0, 0, V1))
for every column that comes with a flag variable. The actual data that I'm working with has hundreds of columns, and not all of them come with a flag variable.
Is there a simple way to perform the above operation on many columns without having to do it one by one, preferably using dplyr-like syntax and avoiding traditional loops? I want to highlight again that I don't want to change all missing values to zeroes; it is very important that this is only done when the flag variable reports 0
. Thank you!
CodePudding user response:
Use across
twice with replace
:
library(dplyr)
df %>%
mutate(replace(across(matches("^V")), across(matches("^flag")) == 0, 0))
id flag_V1 V1 flag_V2 V2 flag_V3 V3
1 abc 2 600 1 400 1 600
2 def 1 1000 1 600 2 300
3 ghi 1 500 0 0 1 600
4 jkl 0 0 2 700 0 0
5 mno 1 700 0 0 1 700