Home > Enterprise >  Combine character strings into a single string by group in R
Combine character strings into a single string by group in R

Time:08-14

I have data that looks like this:

library(dplyr)  

Input <- tibble(
      UID = c("Code001", "Code001","Code002","Code002","Code002","Code002","Code003","Code003","Code003","Code004","Code005","Code005"),
      Name = c("NameABC", "Name JHB", "NameYDB", "Name & KBC","Name-S-D-T", "Name DF-R","NameMEY", "Name PeU","Name Eed", "NameBOR", "Name-Ptg","NameLKD")) 

For each of the unique UID's I would like to combine the Name values into a single string. If there are two entries for the UID I would like an & to separate them. If it is more than 2, then a comma to separate with an & for the last two. I would like to create a new df that looks like this:

Output <- tibble(
      UID = c("Code001", "Code002","Code003","Code004","Code005"),
      Name = c("NameABC & Name JHB", "NameYDB, Name & KBC, Name-S-D-T & Name DF-R", "NameMEY, Name PeU & Name Eed", "NameBOR", "Name-Ptg & NameLKD")) 

CodePudding user response:

I would define a function to concatenate your character vector as you want, then call it in a grouped summarize():

library(dplyr)

concat_names <- function(x) {
  len_x <- length(x)
  if (len_x < 3) {
    paste(x, collapse = " & ")
  } else {
    paste(paste(x[-len_x], collapse = ", "), "&", x[len_x])
  }
}

Input %>%
  group_by(UID) %>%
  summarize(Name = concat_names(Name))

Result:

# A tibble: 5 × 2
  UID     Name                                       
  <chr>   <chr>                                      
1 Code001 NameABC & Name JHB                         
2 Code002 NameYDB, Name & KBC, Name-S-D-T & Name DF-R
3 Code003 NameMEY, Name PeU & Name Eed               
4 Code004 NameBOR                                    
5 Code005 Name-Ptg & NameLKD

CodePudding user response:

  • We can use
library(dplyr)

Input |> group_by(UID) |> 
   summarise(Name = sub("(.*)," , "\\1 &" , toString(Name)))
  • Output
# A tibble: 5 × 2
  UID     Name 
  <chr>   <chr>                                      
1 Code001 NameABC & Name JHB                         
2 Code002 NameYDB, Name & KBC, Name-S-D-T & Name DF-R
3 Code003 NameMEY, Name PeU & Name Eed               
4 Code004 NameBOR                                    
5 Code005 Name-Ptg & NameLKD                                                 

CodePudding user response:

You can create a helper function f that creates the pasted values:

f <- function(x) {
  if(length(x)==1) return(x)
  if(length(x)==2) return(paste0(x,collapse=" & "))
  paste0(paste0(x[1:(length(x)-1)], collapse=", "), " & ", x[length(x)])
}

Then, apply the function in summarize(), by UID

group_by(Input, UID) %>% summarize(Name = f(Name))

Output:

# A tibble: 5 × 2
  UID     Name                                       
  <chr>   <chr>                                      
1 Code001 NameABC & Name JHB                         
2 Code002 NameYDB, Name & KBC, Name-S-D-T & Name DF-R
3 Code003 NameMEY, Name PeU & Name Eed               
4 Code004 NameBOR                                    
5 Code005 Name-Ptg & NameLKD 
  • Related