I have data in below format:
Profile | Group |
---|---|
A | Grp1 |
A | Grp2 |
A | Grp3 |
B | Grp7 |
C | Grp32 |
B | Grp32 |
C | Grp7 |
D | Grp16 |
H | Grp19 |
M | Grp1 |
M | Grp2 |
M | Grp3 |
I would like to find out the profiles (1st column) that have same set of Groups (2nd column).
In the sample data above Profiles "A" and "M" have same set of groups (Grp1, Grp2, Grp3) and profiles "B" and "C" have same set of groups (Grp7, Grp32). Hence, the resulting data should be:
Profile | Group |
---|---|
A | Grp1 |
A | Grp2 |
A | Grp3 |
M | Grp1 |
M | Grp2 |
M | Grp3 |
B | Grp7 |
B | Grp32 |
C | Grp7 |
C | Grp32 |
Would appreciate any help.
CodePudding user response:
A dplyr
solution:
library(dplyr)
df %>%
group_by(Profile) %>%
summarise(Group = toString(sort(Group))) %>%
group_by(Group) %>%
filter(n() > 1) %>%
arrange(.by_group = TRUE) %>%
tidyr::separate_rows(Group)
# # A tibble: 10 × 2
# Profile Group
# <chr> <chr>
# 1 A Grp1
# 2 A Grp2
# 3 A Grp3
# 4 M Grp1
# 5 M Grp2
# 6 M Grp3
# 7 B Grp32
# 8 B Grp7
# 9 C Grp32
# 10 C Grp7
Data
df <- structure(list(Profile = c("A", "A", "A", "B", "C", "B", "C",
"D", "H", "M", "M", "M"), Group = c("Grp1", "Grp2", "Grp3", "Grp7",
"Grp32", "Grp32", "Grp7", "Grp16", "Grp19", "Grp1", "Grp2", "Grp3"
)), class = "data.frame", row.names = c(NA, -12L))