Home > Enterprise >  Concatenate values across multiple rows for various IDs in R
Concatenate values across multiple rows for various IDs in R

Time:12-01

My question is highly related to the following thread: concatenate values across two rows in R

The main difference is that I would like concatenate only those rows, which are of the same ID. So I need to include a grouping of some kind, but I wasn't able to do it.

# desired input
input <- data.frame(ID = c(1,1,1,3,3,3),
                   X1 = c("A", 1, 11, "D", 4, 44),
                   X2 = c("B", 2, 22, "E", 5, 55),
                   X3 = c("C", 3, 33, "F", 6, 66))

# desired output
output <- data.frame(ID = c(1,3),
                     X1 = c("A-1-11", "D-4-44"),
                     X2 = c("B-2-22", "E-5-55"),
                     X3 = c("C-3-33", "F-6-66"))

I tried the solution from the mentioned thread, but this concatenates all six rows:

output_v1 <- data.table::rbindlist(list(input, data.table::setDT(input)[, lapply(.SD, paste, collapse='-')]))

Obviously this does not work, since I am not grouping by ID. But in the documentation I do not find a way for grouping. Can anyone point me in the right direction?

Thanks a lot!

CodePudding user response:

We can use dplyr functions:

library(dplyr)
input %>% 
  group_by(ID) %>% 
  mutate(across(everything(), ~paste0(.,collapse = "-"))) %>% 
  slice(1)
# A tibble: 2 × 4
# Groups:   ID [2]
     ID X1     X2     X3    
  <dbl> <chr>  <chr>  <chr> 
1     1 A-1-11 B-2-22 C-3-33
2     3 D-4-44 E-5-55 F-6-66

CodePudding user response:

With tidyverse:

library(tidyverse)
input %>% as_tibble() %>% group_by(ID) %>% summarise(across(everything(), list(function(col) str_flatten(col, '-'))))

returns:

# A tibble: 2 × 4
     ID X1_1   X2_1   X3_1  
  <dbl> <chr>  <chr>  <chr> 
1     1 A-1-11 B-2-22 C-3-33
2     3 D-4-44 E-5-55 F-6-66
  • Related