I have data that looks like this:
library(dplyr)
Input <- tibble(
UID = c("Code001", "Code001","Code002","Code002","Code002","Code002","Code003","Code003","Code003","Code004","Code005","Code005"),
Name = c("NameABC", "Name JHB", "NameYDB", "Name & KBC","Name-S-D-T", "Name DF-R","NameMEY", "Name PeU","Name Eed", "NameBOR", "Name-Ptg","NameLKD"))
For each of the unique UID's
I would like to combine the Name
values into a single string. If there are two entries for the UID I would like an & to separate them. If it is more than 2, then a comma to separate with an & for the last two. I would like to create a new df that looks like this:
Output <- tibble(
UID = c("Code001", "Code002","Code003","Code004","Code005"),
Name = c("NameABC & Name JHB", "NameYDB, Name & KBC, Name-S-D-T & Name DF-R", "NameMEY, Name PeU & Name Eed", "NameBOR", "Name-Ptg & NameLKD"))
CodePudding user response:
I would define a function to concatenate your character vector as you want, then call it in a grouped summarize()
:
library(dplyr)
concat_names <- function(x) {
len_x <- length(x)
if (len_x < 3) {
paste(x, collapse = " & ")
} else {
paste(paste(x[-len_x], collapse = ", "), "&", x[len_x])
}
}
Input %>%
group_by(UID) %>%
summarize(Name = concat_names(Name))
Result:
# A tibble: 5 × 2
UID Name
<chr> <chr>
1 Code001 NameABC & Name JHB
2 Code002 NameYDB, Name & KBC, Name-S-D-T & Name DF-R
3 Code003 NameMEY, Name PeU & Name Eed
4 Code004 NameBOR
5 Code005 Name-Ptg & NameLKD
CodePudding user response:
- We can use
library(dplyr)
Input |> group_by(UID) |>
summarise(Name = sub("(.*)," , "\\1 &" , toString(Name)))
- Output
# A tibble: 5 × 2
UID Name
<chr> <chr>
1 Code001 NameABC & Name JHB
2 Code002 NameYDB, Name & KBC, Name-S-D-T & Name DF-R
3 Code003 NameMEY, Name PeU & Name Eed
4 Code004 NameBOR
5 Code005 Name-Ptg & NameLKD
CodePudding user response:
You can create a helper function f
that creates the pasted values:
f <- function(x) {
if(length(x)==1) return(x)
if(length(x)==2) return(paste0(x,collapse=" & "))
paste0(paste0(x[1:(length(x)-1)], collapse=", "), " & ", x[length(x)])
}
Then, apply the function in summarize()
, by UID
group_by(Input, UID) %>% summarize(Name = f(Name))
Output:
# A tibble: 5 × 2
UID Name
<chr> <chr>
1 Code001 NameABC & Name JHB
2 Code002 NameYDB, Name & KBC, Name-S-D-T & Name DF-R
3 Code003 NameMEY, Name PeU & Name Eed
4 Code004 NameBOR
5 Code005 Name-Ptg & NameLKD