Split dataframe based on column header prefix and preserve additional desired columns in R-CodePudding

Here is my dataframe:

df <- structure(list(contig = c("F26D10.3.1", "F26D10.3.1", "F26D10.3.1", 
"F26D10.3.1"), sample_name = structure(1:4, .Label = c("GLD4_2", 
"GLD4_3", "WT_1", "WT_2"), class = "factor"), group = structure(c(1L, 
1L, 2L, 2L), .Label = c("GLD4", "WT"), class = "factor"), counts_C = c(90L, 
134L, 130L, 167L), hits_C = c(99, 147, 142, 174), counts_G = c(16L, 
34L, 28L, 35L), hits_G = c(16, 39, 32, 38), counts_U = c(904L, 
1150L, 1027L, 1602L), hits_U = c(1480, 1904, 1700, 2661)), row.names = c(NA, 
-4L), class = c("tbl_df", "tbl", "data.frame"))

I want to split it into 2 dataframes, each containing 3 first columns followed by:

all the remaining columns with names containing "counts_"
all the remaining columns with names containing "hits_"

I am obtaining this by following code:

grouping_vars <- df[,1:3]

wanted <- sapply(c("counts_", "hits_"),
       function(x) df[startsWith(names(df),x)],
       simplify = FALSE)

df1 <- cbind(grouping_vars, wanted[1])
df2 <- cbind(grouping_vars, wanted[2])

However, this results in repeated header segments (e.g. hits_.hits_C). I can trim this, off course, but looking for more efficient way for the entire operation.

CodePudding user response：

We could use split.default by removing the substring from the column name into a list of data.frames and then append the first three columns

lapply(split.default(df[-(1:3)], sub("_.*", "", names(df)[-(1:3)])), 
     function(x) cbind(df[1:3], x))

-output

$counts
      contig sample_name group counts_C counts_G counts_U
1 F26D10.3.1      GLD4_2  GLD4       90       16      904
2 F26D10.3.1      GLD4_3  GLD4      134       34     1150
3 F26D10.3.1        WT_1    WT      130       28     1027
4 F26D10.3.1        WT_2    WT      167       35     1602

$hits
      contig sample_name group hits_C hits_G hits_U
1 F26D10.3.1      GLD4_2  GLD4     99     16   1480
2 F26D10.3.1      GLD4_3  GLD4    147     39   1904
3 F26D10.3.1        WT_1    WT    142     32   1700
4 F26D10.3.1        WT_2    WT    174     38   2661