I have a dataframe where I would like to go through all columns that end with _qc
and if the value is “4”, then set NA to the corresponding column without the _qc
suffix.
For example, if the value of a column named chla_adjusted_qc == 4
, then, set the value of chla_adjusted
to NA.
library(tidyverse)
df <- tibble(
chla_adjusted = c(100, 2),
chla_adjusted_qc = c("4", "1"),
bbp_adjusted = c(0.1, 9999),
bbp_adjusted_qc = c("2", "4")
)
df
#> # A tibble: 2 × 4
#> chla_adjusted chla_adjusted_qc bbp_adjusted bbp_adjusted_qc
#> <dbl> <chr> <dbl> <chr>
#> 1 100 4 0.1 2
#> 2 2 1 9999 4
The desired output would be
tibble(
chla_adjusted = c(NA, 2),
chla_adjusted_qc = c("4", "1"),
bbp_adjusted = c(0.1, NA),
bbp_adjusted_qc = c("2", "4")
)
#> # A tibble: 2 × 4
#> chla_adjusted chla_adjusted_qc bbp_adjusted bbp_adjusted_qc
#> <dbl> <chr> <dbl> <chr>
#> 1 NA 4 0.1 2
#> 2 2 1 NA 4
What I have done so far was to grab the current column name and find the corresponding column in which I want to set the NA value.
df |>
mutate(across(ends_with("_qc"), \(var) {
# If var is chla_adjusted_qc, then lets modify the value in chla_adjusted
col <- str_remove(cur_column(), "_qc")
# if (var == "4") {
# # What to do here?
# }
}))
#> # A tibble: 2 × 4
#> chla_adjusted chla_adjusted_qc bbp_adjusted bbp_adjusted_qc
#> <dbl> <chr> <dbl> <chr>
#> 1 100 chla_adjusted 0.1 bbp_adjusted
#> 2 2 chla_adjusted 9999 bbp_adjusted
Thank you.
Created on 2022-12-20 with reprex v2.0.2
CodePudding user response:
df %>%
mutate(across(ends_with("_qc"),
~ replace(cur_data()[[ sub("_qc$", "", cur_column()) ]], . == 4L, NA),
.names = "{sub('_qc$', '', .col)}"))
# # A tibble: 2 × 4
# chla_adjusted chla_adjusted_qc bbp_adjusted bbp_adjusted_qc
# <dbl> <chr> <dbl> <chr>
# 1 NA 4 0.1 2
# 2 2 1 NA 4
CodePudding user response:
Base R solution:
for(v in grep("_qc$",names(df), value=TRUE)){
df[[sub("_qc$","",v)]][df[[v]]==4] <- NA
}
> df
# A tibble: 2 × 4
chla_adjusted chla_adjusted_qc bbp_adjusted bbp_adjusted_qc
<dbl> <chr> <dbl> <chr>
1 NA 4 0.1 2
2 2 1 NA 4
>
CodePudding user response:
We could use across2
from dplyover
library(dplyover)
df %>%
mutate(across2(ends_with('adjusted'), ends_with('_qc'),
~ case_when(.y !=4 ~ .x ), .names = "{xcol}"))
-output
# A tibble: 2 × 4
chla_adjusted chla_adjusted_qc bbp_adjusted bbp_adjusted_qc
<dbl> <chr> <dbl> <chr>
1 NA 4 0.1 2
2 2 1 NA 4