I have a dataframe like below and i am concatenating the IDs such that each Id should have one row. However, I am looking to summarize only the unique values. For example for region 1 and branch A, ID 1 has value "1" in both rows so output should be 1 instead of 1,1 but ID2 should be 1,2 as both values are unique. So only concatenate unique values.
Data <-
Region branch ID1 ID2 ID3 ID4
1 A 1 1 2 4
1 A 1 2 2 3
2 B 2 2 2 2
2 B 2 2 3 4
Desired Output
Region branch ID1 ID2 ID3 ID4
1 A 1 1,2 2 4,3
2 B 2 2 2,3 2,4
This is what I am trying
Data %>%
group_by(Region, branch) %>%
summarise(across(everything(), ~toString(.)))
CodePudding user response:
How about
library(tidyverse)
Data %>%
group_by(Region, branch) %>%
summarise(
across(
everything(),
~paste(as.list(unique(.)), collapse=","),
.groups="drop"
)
)
# A tibble: 2 × 6
Region branch ID1 ID2 ID3 ID4
<int> <chr> <chr> <chr> <chr> <chr>
1 1 A 1 1,2 2 4,3
2 2 B 2 2 2,3 2,4
Edit
It also works without the as.list()
.
CodePudding user response:
Depends on if you want lists or strings with a comma. For the former, do
res1 <- by(Data, Data$branch, \(x) list2DF(lapply(x, \(.) list(unique(.))))) |>
do.call(what=rbind)
res1
# Region branch ID1 ID2 ID3 ID4
# A 1 A 1 1, 2 2 4, 3
# B 2 B 2 2 2, 3 2, 4
where:
str(res)
# 'data.frame': 2 obs. of 6 variables:
# $ Region:List of 2
# ..$ : int 1
# ..$ : int 2
# $ branch:List of 2
# ..$ : chr "A"
# ..$ : chr "B"
# $ ID1 :List of 2
# ..$ : int 1
# ..$ : int 2
# $ ID2 :List of 2
# ..$ : int 1 2
# ..$ : int 2
# $ ID3 :List of 2
# ..$ : int 2
# ..$ : int 2 3
# $ ID4 :List of 2
# ..$ : int 4 3
# ..$ : int 2 4
And for the latter,
res2 <- by(Data, Data$branch, \(x) list2DF(lapply(x, \(.) toString(unique(.))))) |>
do.call(what=rbind)
res2
# Region branch ID1 ID2 ID3 ID4
# A 1 A 1 1, 2 2 4, 3
# B 2 B 2 2 2, 3 2, 4
where:
str(res2)
# 'data.frame': 2 obs. of 6 variables:
# $ Region: chr "1" "2"
# $ branch: chr "A" "B"
# $ ID1 : chr "1" "2"
# $ ID2 : chr "1, 2" "2"
# $ ID3 : chr "2" "2, 3"
# $ ID4 : chr "4, 3" "2, 4"