Here is my dataframe:
df <- structure(list(contig = c("F26D10.3.1", "F26D10.3.1", "F26D10.3.1",
"F26D10.3.1"), sample_name = structure(1:4, .Label = c("GLD4_2",
"GLD4_3", "WT_1", "WT_2"), class = "factor"), group = structure(c(1L,
1L, 2L, 2L), .Label = c("GLD4", "WT"), class = "factor"), counts_C = c(90L,
134L, 130L, 167L), hits_C = c(99, 147, 142, 174), counts_G = c(16L,
34L, 28L, 35L), hits_G = c(16, 39, 32, 38), counts_U = c(904L,
1150L, 1027L, 1602L), hits_U = c(1480, 1904, 1700, 2661)), row.names = c(NA,
-4L), class = c("tbl_df", "tbl", "data.frame"))
I want to split it into 2 dataframes, each containing 3 first columns followed by:
- all the remaining columns with names containing "counts_"
- all the remaining columns with names containing "hits_"
I am obtaining this by following code:
grouping_vars <- df[,1:3]
wanted <- sapply(c("counts_", "hits_"),
function(x) df[startsWith(names(df),x)],
simplify = FALSE)
df1 <- cbind(grouping_vars, wanted[1])
df2 <- cbind(grouping_vars, wanted[2])
However, this results in repeated header segments (e.g. hits_.hits_C). I can trim this, off course, but looking for more efficient way for the entire operation.
CodePudding user response:
We could use split.default
by removing the substring from the column name into a list
of data.frames and then append the first three columns
lapply(split.default(df[-(1:3)], sub("_.*", "", names(df)[-(1:3)])),
function(x) cbind(df[1:3], x))
-output
$counts
contig sample_name group counts_C counts_G counts_U
1 F26D10.3.1 GLD4_2 GLD4 90 16 904
2 F26D10.3.1 GLD4_3 GLD4 134 34 1150
3 F26D10.3.1 WT_1 WT 130 28 1027
4 F26D10.3.1 WT_2 WT 167 35 1602
$hits
contig sample_name group hits_C hits_G hits_U
1 F26D10.3.1 GLD4_2 GLD4 99 16 1480
2 F26D10.3.1 GLD4_3 GLD4 147 39 1904
3 F26D10.3.1 WT_1 WT 142 32 1700
4 F26D10.3.1 WT_2 WT 174 38 2661