In a tibble, I would like to be able to correct certain values taken by the variables nbeta_dep01
, nbeta_dep02
...
Below is a reproducible example of what I'm doing.
I would like to know if there is a way to shorten the syntax (because in my example I copy and paste as many times the correction instruction as I have of variable nbeta_depXX)
suppressMessages(library(dplyr))
test <- tribble(
~ent, ~dep_impl, ~nbeta_dep01, ~nbeta_dep02, ~nbeta_dep03, ~nbeta_dep04, ~nbeta_dep05,
"a", "01", 0, 0, 0, 0, 0,
"b", "03", 2, 0, 3, 0, 1,
"c", "05", 0, 0, 0, 1, 0,
"d", "02", 0, 0, 0, 0, 0
)
test %>%
rowwise() %>%
mutate(
nbeta_dep01 = ifelse(
nbeta_dep01==0 & nbeta_dep02==0 & nbeta_dep03==0 & nbeta_dep04==0 & nbeta_dep05==0 & dep_impl=="01",
1,
nbeta_dep01),
nbeta_dep02 = ifelse(
nbeta_dep01==0 & nbeta_dep02==0 & nbeta_dep03==0 & nbeta_dep04==0 & nbeta_dep05==0 & dep_impl=="02",
1,
nbeta_dep02),
nbeta_dep03 = ifelse(
nbeta_dep01==0 & nbeta_dep02==0 & nbeta_dep03==0 & nbeta_dep04==0 & nbeta_dep05==0 & dep_impl=="03",
1,
nbeta_dep03),
nbeta_dep04 = ifelse(
nbeta_dep04==0 & nbeta_dep02==0 & nbeta_dep03==0 & nbeta_dep04==0 & nbeta_dep05==0 & dep_impl=="04",
1,
nbeta_dep04),
)
#> # A tibble: 4 x 7
#> # Rowwise:
#> ent dep_impl nbeta_dep01 nbeta_dep02 nbeta_dep03 nbeta_dep04 nbeta_dep05
#> <chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 a 01 1 0 0 0 0
#> 2 b 03 2 0 3 0 1
#> 3 c 05 0 0 0 1 0
#> 4 d 02 0 1 0 0 0
Created on 2021-10-25 by the reprex package (v2.0.1)
CodePudding user response:
You could use
library(dplyr)
library(stringr)
test %>%
mutate(across(matches("dep\\d $"),
~ifelse(rowSums(across(nbeta_dep01:nbeta_dep05)) == 0 & dep_impl == str_extract(cur_column(), "\\d $"),
1,
.x)))
This returns
# A tibble: 4 x 7
ent dep_impl nbeta_dep01 nbeta_dep02 nbeta_dep03 nbeta_dep04 nbeta_dep05
<chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl>
1 a 01 1 0 0 0 0
2 b 03 2 0 3 0 1
3 c 05 0 0 0 1 0
4 d 02 0 1 0 0 0
- We identify the columns to be changes with a regular expression:
"dep\\d $"
matches all columns that end with "dep" followed by two digits. Those columns are used in anacross()
function. - The
if
statement is simplified: since allnbeta_dep
columns need to be0
we take the sum of those columns by using arowSum
function combined with a selectingacross()
function. Furthermore, we check, if the digits in current column name match the digits in columndep_impl
. - If these conditions are met, we return
1
else the value already in the current column/row is returned.x
.
CodePudding user response:
You can refer to the columns, whose names all start in the same way, using the function starts_with
:
test %>%
mutate(across(starts_with("nbeta"),
~ifelse(
nbeta_dep01==0 & nbeta_dep02==0 & nbeta_dep03==0 & nbeta_dep04==0 & nbeta_dep05==0 & dep_impl=="01",
1,
nbeta_dep01)))