I wrote a function to determine whether for a given variable, the values in a group are not all the same and create a new variable that provides the name of the variable if there is a difference or NA if there is none.
x <- c(2,4,5,5,6,2,3)
y <- c(5,5,2,3,6,1,8)
z <- c(5,2,4,1,3,5,1)
xy <- tibble(x, y, z)
diff_var <- function(a, b) {
a %>%
transmute("{{c}}" := n_distinct({{c}})) %>%
ungroup() %>%
select({{c}}) %>%
imap_dfc(~ if_else(.x > 1, .y, NA_character_))
}
xy %>%
group_by(x) %>%
diff_var(., y)
# A tibble: 7 × 1
y
<chr>
1 y
2 NA
3 y
4 y
5 NA
6 y
7 NA
I’m now trying to figure out how to do this across multiple variables (ideally excluding the grouping variable). For the sample data here, y and z. The variable to group by will be the same for all variables. My various attempt at using different forms of map have failed; I'm struggling to get the arguments into my custom function accurately. Eventually I'll want to have the grouping variable included but I can easily add that back later.
Desired output:
# A tibble: 7 × 2
y z
<chr> <chr>
1 y NA
2 NA NA
3 y z
4 y z
5 NA NA
6 y NA
7 NA NA
Though there are similarly titled questions on this site, I haven't been able to adapt them to my current case.
CodePudding user response:
It may be easier to use across
and pass a vector of column names in c
as quoted. In addition, the last step can be done within dplyr
itself i.e. using across
diff_var <- function(a, c) {
a %>%
transmute(across(all_of(c), n_distinct)) %>%
ungroup() %>%
select(all_of(c)) %>%
mutate(across(everything(), ~ case_when(.x > 1~ cur_column())))
}
-testing
xy %>%
group_by(x) %>%
diff_var(., c("y", "z"))
# A tibble: 7 × 2
y z
<chr> <chr>
1 y <NA>
2 <NA> <NA>
3 y z
4 y z
5 <NA> <NA>
6 y <NA>
7 <NA> <NA>