Home > Software engineering >  Splitting columns based on a specific value in the name with R
Splitting columns based on a specific value in the name with R

Time:07-02

I have this table

abc[1] abc[1] abc[2] abc[2] abc[3] abc[3] def[1] def[1] def[2] def[2] def[3] def[3]
One Two Three One Two Three One Two Three One Two Three
One Two Three One Two Three One Two Three One Two Three One Three Two One Three Two
One Two Three One Two Three One Two Three One Two Three

It contains columns that have a name and then a number in brackets, like you see: [1], [2], [3].. This is just an example with numbers from 1 to 3, but it can be bigger, like a 100. I would like to have a script who selects and put in a new df all the columns containing in the name [1], [2], [3]...[n]. I know that I can select with tidy verse with select(contains("..")) but since the numbers in the brackets can change in my initial table (they can be 1 to 3 or 1 to 100) I need something applicable to different cases. I was thinking of maybe generating a string from 1:n, but the problem is that the file I have always have the number in brackets []. And I can't just use contain with the numbers because unfortunately the names of the columns contains characters and numbers. This is the result I am looking for:

New table 1

abc[1] abc[1] def[1] def[1]
One Two One Two
One Two One Two
One Two One Two
One Two One Two

New table 2

abc[2] abc[2] def[2] def[2]
One Two One Two
One Two One Two
One Two One Two
One Two One Two

New table 3

abc[3] abc[3] def[3] def[3]
One Two One Two
One Two One Two
One Two One Two
One Two One Two

CodePudding user response:

You can use data.table, in the following way

library(data.table)

setDT(df)

lapply(1:3, \(i)  {
  cols = (1:ncol(df))[grepl(paste0("\\[",i,"\\]"),names(df))]
  df[,.SD, .SDcols = cols]
})

Output:

[[1]]
   abc[1] abc[1] def[1] def[1]
1:    One    Two    One    Two
2:    One    Two    One    Two
3:    One    Two    One    Two

[[2]]
   abc[2] abc[2] def[2] def[2]
1:  Three    One  Three    One
2:  Three    One  Three    One
3:  Three    One  Three    One

[[3]]
   abc[3] abc[3] def[3] def[3]
1:    Two  Three    Two  Three
2:    Two  Three    Two  Three
3:    Two  Three    Two  Three

Input:

df = structure(list(`abc[1]` = c("One", "One", "One"), `abc[1]` = c("Two", 
"Two", "Two"), `abc[2]` = c("Three", "Three", "Three"), `abc[2]` = c("One", 
"One", "One"), `abc[3]` = c("Two", "Two", "Two"), `abc[3]` = c("Three", 
"Three", "Three"), `def[1]` = c("One", "One", "One"), `def[1]` = c("Two", 
"Two", "Two"), `def[2]` = c("Three", "Three", "Three"), `def[2]` = c("One", 
"One", "One"), `def[3]` = c("Two", "Two", "Two"), `def[3]` = c("Three", 
"Three", "Three")), row.names = c(NA, -3L), class = "data.frame")
  • Related