Home > Net >  How to paste two rows of an R dataframe based on a column index?
How to paste two rows of an R dataframe based on a column index?

Time:04-12

I have an R dataframe that look like this:

rs2288741 rs1821185 rs1432315 ID
T         A         T         A 
T         C         T         A 
G         C         C         B 
T         C         T         B
G         A         C         C
G         A         C         C
T         A         C         D
T         C         T         D

I need to paste the row values when the "ID" is equal to the next. The outfile should look like this:

rs2288741 rs1821185 rs1432315 ID 
TT        AC        TT        A
GT        CC        CT        B
GG        AA        CC        C
TT        AC        CT        D

Is there any easy way to get this?

CodePudding user response:

data.table solution

setDT(mydata)[, lapply(.SD, paste0, collapse = ""), by = .(ID)]
#    ID rs2288741 rs1821185 rs1432315
# 1:  A        TT        AC        TT
# 2:  B        GT        CC        CT
# 3:  C        GG        AA        CC
# 4:  D        TT        AC        CT

CodePudding user response:

If you use the tidyverse, you could do:

library(tidyverse)

df %>% 
  group_by(ID) %>%
  summarize(across(everything(), ~ paste(.x, collapse = ''))) %>%
  select(2:4, 1)
#> # A tibble: 4 x 4
#>   rs2288741 rs1821185 rs1432315 ID   
#>   <chr>     <chr>     <chr>     <chr>
#> 1 TT        AC        TT        A    
#> 2 GT        CC        CT        B    
#> 3 GG        AA        CC        C    
#> 4 TT        AC        CT        D

Created on 2022-04-11 by the reprex package (v2.0.1)

CodePudding user response:

A base R version could be

aggregate(. ~ ID, data = df, function(x) paste(x, collapse = ""))
#>   ID rs2288741 rs1821185 rs1432315
#> 1  A        TT        AC        TT
#> 2  B        GT        CC        CT
#> 3  C        GG        AA        CC
#> 4  D        TT        AC        CT
  • Related