I am trying to mutate a tibble based on the following conditions:
- For each row, if the column containing only the prefix, i.e., a or b, has the value 1, other columns starting with the prefix in question should be recoded to 1 as well
- However, for each row, if any of the columns starting with the prefix has the value 1, the values in all rows beginning with that prefix should remain
- The columns that are named with only the prefix should be deleted after the mutation.
A reproducible example is:
tibble(a = c(1, 1, 0, 0, 1),
a.1 = c(0, 0, 1, 0, 1),
a.2 = c(0, 0, 0, 1, 0),
b = c(0, 0, 0, 0, 1),
b.1 = c(0, 0, 0, 1, 0),
b.2 = c(0, 0, 0, 0, 0))
# A tibble: 5 × 6
a a.1 a.2 b b.1 b.2
<dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 1 0 0 0 0 0
2 1 0 0 0 0 0
3 0 1 0 0 0 0
4 0 0 1 0 1 0
5 1 1 0 1 0 0
The end result should look like:
tibble(
a.1 = c(1, 0, 1, 0, 1),
a.2 = c(1, 0, 0, 1, 0),
b.1 = c(0, 0, 0, 1, 1),
b.2 = c(0, 0, 0, 0,
1))
# A tibble: 5 × 4
a.1 a.2 b.1 b.2
<dbl> <dbl> <dbl> <dbl>
1 1 1 0 0
2 0 0 0 0
3 1 0 0 0
4 0 1 1 0
5 1 0 1 1
There is not a constant amount of variables for each prefix in my real data. Thus, I am trying to write a general function.
If anyone can help me out, it is greatly appreciated :)
CodePudding user response:
A solution with split.default
map_dfc
:
tbl %>%
split.default(gsub("\\..*", "", colnames(.))) %>%
map_dfc(~ {.x[.x[1] == 1 & rowSums(.x[-1]) == 0, ] <- 1
.x[-1]})
output
# A tibble: 5 × 4
a.1 a.2 b.1 b.2
<dbl> <dbl> <dbl> <dbl>
1 1 1 0 0
2 1 1 0 0
3 1 0 0 0
4 0 1 1 0
5 1 0 1 1