I have data frame contain around 900 columns and I want to combine the content of every 3 columns in one column I tried to do a loop but I'm not good at it, can you please help me?
the data frame look like this :
df_out <- data.frame(
"name" = c("1", "2", "3", "4", "5", "6", "7", "8"),
"col1"=rep("B",times= 8),
"col2"=rep("A",times= 8),
"col3"=rep("G",times= 8),
"col4"=rep("C",times= 8),
"col5"=rep("C",times=8),
"col6"=rep("B",times= 8),
"col7"=rep("A",times= 8),
"col8"=rep("G",times= 8),
"col9"=rep("C",times= 8),
"col10"=rep("C",times=8),
"col11"=rep("B",times= 8),
"col12"=rep("A",times= 8),
"col13"=rep("G",times= 8),
"col14"=rep("C",times= 8),
"col15"=rep("C",times=8)
)
df_out
So the final results should be like this:
df_out2 <- data.frame(
"name" = c("1", "2", "3", "4", "5", "6", "7", "8"),
"col1"=rep("BAG",times= 8),
"col2"=rep("CCB",times= 8),
"col3"=rep("AGC",times= 8),
"col4"=rep("CBA",times= 8),
"col5"=rep("GCC",times=8))
df_out2
Any help is appreciated. Also, Can you please recommend any book or good website to learn loops in R? Thank you
CodePudding user response:
Please find below one possible solution using tidyr
and dplyr
libraries.
Reprex
- Code
library(tidyr)
library(dplyr)
cnames <- paste0("col", 1:((ncol(df_out)-1)/3))
separators <- seq(from = 3, to = (ncol(df_out)-1)-3, by = 3)
df_out %>%
select(-name) %>%
unite(newCol, "col1":"col15", sep = "") %>% # for your case, please change "col1":"col15" to "col1":"col900"
separate(newCol, into = cnames, sep = separators) %>%
cbind(df_out[1], .)
- Output
#> name col1 col2 col3 col4 col5
#> 1 1 BAG CCB AGC CBA GCC
#> 2 2 BAG CCB AGC CBA GCC
#> 3 3 BAG CCB AGC CBA GCC
#> 4 4 BAG CCB AGC CBA GCC
#> 5 5 BAG CCB AGC CBA GCC
#> 6 6 BAG CCB AGC CBA GCC
#> 7 7 BAG CCB AGC CBA GCC
#> 8 8 BAG CCB AGC CBA GCC
Created on 2022-02-20 by the reprex package (v2.0.1)
CodePudding user response:
Base R option.
Using split.default
we can split the dataframe into 3 columns each, use do.call
with paste0
to combine the values together for each row.
n <- 3
tmp <- df_out[-1]
cbind(df_out[1],
sapply(split.default(tmp, ceiling(seq_along(tmp)/n)),
function(x) do.call(paste0, x)))
# name 1 2 3 4 5
#1 1 BAG CCB AGC CBA GCC
#2 2 BAG CCB AGC CBA GCC
#3 3 BAG CCB AGC CBA GCC
#4 4 BAG CCB AGC CBA GCC
#5 5 BAG CCB AGC CBA GCC
#6 6 BAG CCB AGC CBA GCC
#7 7 BAG CCB AGC CBA GCC
#8 8 BAG CCB AGC CBA GCC