Home > Blockchain >  Merging every three columns of a data frame in R
Merging every three columns of a data frame in R

Time:02-20

I have data frame contain around 900 columns and I want to combine the content of every 3 columns in one column I tried to do a loop but I'm not good at it, can you please help me?
the data frame look like this :

df_out <- data.frame(
  "name" = c("1", "2", "3", "4", "5", "6", "7", "8"),
  "col1"=rep("B",times= 8),
  "col2"=rep("A",times= 8),
  "col3"=rep("G",times= 8),
  "col4"=rep("C",times= 8),
  "col5"=rep("C",times=8),
  "col6"=rep("B",times= 8),
  "col7"=rep("A",times= 8),
  "col8"=rep("G",times= 8),
  "col9"=rep("C",times= 8),
  "col10"=rep("C",times=8),
  "col11"=rep("B",times= 8),
  "col12"=rep("A",times= 8),
  "col13"=rep("G",times= 8),
  "col14"=rep("C",times= 8),
  "col15"=rep("C",times=8)  
)

df_out

So the final results should be like this:

df_out2 <- data.frame(
  "name" = c("1", "2", "3", "4", "5", "6", "7", "8"),
  "col1"=rep("BAG",times= 8),
  "col2"=rep("CCB",times= 8),
  "col3"=rep("AGC",times= 8),
  "col4"=rep("CBA",times= 8),
  "col5"=rep("GCC",times=8))

df_out2

Any help is appreciated. Also, Can you please recommend any book or good website to learn loops in R? Thank you

CodePudding user response:

Please find below one possible solution using tidyr and dplyr libraries.

Reprex

  • Code
library(tidyr)
library(dplyr)

cnames <- paste0("col", 1:((ncol(df_out)-1)/3))  
separators <- seq(from = 3, to = (ncol(df_out)-1)-3, by = 3)

df_out %>% 
  select(-name) %>% 
  unite(newCol, "col1":"col15", sep = "") %>% # for your case, please change "col1":"col15" to "col1":"col900"
  separate(newCol, into = cnames, sep = separators) %>% 
  cbind(df_out[1], .)
  • Output
#>   name col1 col2 col3 col4 col5
#> 1    1  BAG  CCB  AGC  CBA  GCC
#> 2    2  BAG  CCB  AGC  CBA  GCC
#> 3    3  BAG  CCB  AGC  CBA  GCC
#> 4    4  BAG  CCB  AGC  CBA  GCC
#> 5    5  BAG  CCB  AGC  CBA  GCC
#> 6    6  BAG  CCB  AGC  CBA  GCC
#> 7    7  BAG  CCB  AGC  CBA  GCC
#> 8    8  BAG  CCB  AGC  CBA  GCC

Created on 2022-02-20 by the reprex package (v2.0.1)

CodePudding user response:

Base R option.

Using split.default we can split the dataframe into 3 columns each, use do.call with paste0 to combine the values together for each row.

n <- 3
tmp <- df_out[-1]

cbind(df_out[1], 
      sapply(split.default(tmp, ceiling(seq_along(tmp)/n)), 
             function(x) do.call(paste0, x)))

#  name   1   2   3   4   5
#1    1 BAG CCB AGC CBA GCC
#2    2 BAG CCB AGC CBA GCC
#3    3 BAG CCB AGC CBA GCC
#4    4 BAG CCB AGC CBA GCC
#5    5 BAG CCB AGC CBA GCC
#6    6 BAG CCB AGC CBA GCC
#7    7 BAG CCB AGC CBA GCC
#8    8 BAG CCB AGC CBA GCC
  •  Tags:  
  • r
  • Related