How to use dplyr::across to check unique values in multiple columns by group?
This code will still treat each column independently. I would like to have the number of unique values across variables DX1:DX4 together.
Here id=1 would have 5 unique values: A,B, C, D, F. ID 2 would also have 5 A, B, C, D, E.
library(dplyr)
x <- dat %>%
group_by(id) %>%
summarize(across(DX1:DX4, n_distinct, na.rm=T))
df <- read.table(header = TRUE, text = "
id DX1 DX2 DX3 DX4
1 A B A A
1 A A A C
1 D A A A
1 A A A F
1 A A A A
2 A A A A
2 A C A A
2 A A A D
2 A E D B
", stringsAsFactors = FALSE)
CodePudding user response:
After grouping by 'id', use across
to select the columns, unlist/flatten_chr
and get the number of distinct elements (n_distinct
)
library(dplyr)
library(purrr)
df %>%
group_by(id) %>%
summarise(n = n_distinct(flatten_chr(across(DX1:DX4)), na.rm = TRUE),
.groups = 'drop')
-output
# A tibble: 2 × 2
id n
<int> <int>
1 1 5
2 2 5
CodePudding user response:
I don't think across
is the "tidyverse" way to go. I suggest cur_data()
instead.
df %>%
group_by(id) %>%
summarise(n = n_distinct(unlist(select(cur_data(),DX1:DX4))))
## A tibble: 2 × 2
# id n
# <int> <int>
#1 1 5
#2 2 5
CodePudding user response:
Base R:
> df=read.table(header=T,text="id DX1 DX2 DX3 DX4\n1 A B A A\n1 A A A C\n1 D A A A\n1 A A A F\n1 A A A A\n2 A A A A\n2 A C A A\n2 A A A D\n2 A E D B")
> sapply(split(df[,-1],df[,1]),\(x)length(unique(unlist(x))))
1 2
5 5