Home > Back-end >  R concatenate only unique values in rows by ID
R concatenate only unique values in rows by ID

Time:06-07

I have a dataframe like below and i am concatenating the IDs such that each Id should have one row. However, I am looking to summarize only the unique values. For example for region 1 and branch A, ID 1 has value "1" in both rows so output should be 1 instead of 1,1 but ID2 should be 1,2 as both values are unique. So only concatenate unique values.

Data <-
Region branch ID1 ID2 ID3 ID4
1 A 1 1 2 4
1 A 1 2 2 3
2 B 2 2 2 2
2 B 2 2 3 4

Desired Output
Region branch ID1 ID2 ID3 ID4
    1 A 1 1,2 2 4,3
    2 B 2 2 2,3 2,4

This is what I am trying

Data %>%
group_by(Region, branch) %>% 
  summarise(across(everything(), ~toString(.)))

CodePudding user response:

How about

library(tidyverse)

Data %>% 
  group_by(Region, branch) %>% 
  summarise(
    across(
      everything(), 
      ~paste(as.list(unique(.)), collapse=","),
      .groups="drop"
    )
  ) 
# A tibble: 2 × 6
  Region branch ID1   ID2   ID3   ID4  
   <int> <chr>  <chr> <chr> <chr> <chr>
1      1 A      1     1,2   2     4,3  
2      2 B      2     2     2,3   2,4  

Edit

It also works without the as.list().

CodePudding user response:

Depends on if you want lists or strings with a comma. For the former, do

res1 <- by(Data, Data$branch, \(x) list2DF(lapply(x, \(.) list(unique(.))))) |>
  do.call(what=rbind)
res1
#   Region branch ID1  ID2  ID3  ID4
# A      1      A   1 1, 2    2 4, 3
# B      2      B   2    2 2, 3 2, 4

where:

str(res)
# 'data.frame': 2 obs. of  6 variables:
#  $ Region:List of 2
#   ..$ : int 1
#   ..$ : int 2
#  $ branch:List of 2
#   ..$ : chr "A"
#   ..$ : chr "B"
#  $ ID1   :List of 2
#   ..$ : int 1
#   ..$ : int 2
#  $ ID2   :List of 2
#   ..$ : int  1 2
#   ..$ : int 2
#  $ ID3   :List of 2
#   ..$ : int 2
#   ..$ : int  2 3
#  $ ID4   :List of 2
#   ..$ : int  4 3
#   ..$ : int  2 4

And for the latter,

res2 <- by(Data, Data$branch, \(x) list2DF(lapply(x, \(.) toString(unique(.))))) |>
  do.call(what=rbind)
res2
#   Region branch ID1  ID2  ID3  ID4
# A      1      A   1 1, 2    2 4, 3
# B      2      B   2    2 2, 3 2, 4

where:

str(res2)
# 'data.frame': 2 obs. of  6 variables:
#  $ Region: chr  "1" "2"
#  $ branch: chr  "A" "B"
#  $ ID1   : chr  "1" "2"
#  $ ID2   : chr  "1, 2" "2"
#  $ ID3   : chr  "2" "2, 3"
#  $ ID4   : chr  "4, 3" "2, 4"
  •  Tags:  
  • r
  • Related