Reorder and Group Multiple Columns by Regex/pattern-CodePudding

I have the following:

a_aa a_ab a_ac b_aa b_ab b_ac
2    3    3    3     1    2
3    4    1    1     3    1

Desired outcome:

a_aa b_aa a_ab b_ab a_ac b_ac
2    3    3    1     3    2
3    1    4    3     1    1

Code with data:

d <- "a_aa a_ab a_ac b_aa   b_ab b_ac
2    3    3    3     1    2
3    4    1    1     3    1"
dd <- read.table(textConnection(object = d), header = T)

My current solution is manual:

    dd %>% select(a_aa, b_aa, a_ab, b_ab, a_ac, b_ac)

however, is onerous when number of columns is large. Any ideas how to do this kind of column ordering with grouping (e.g. sequence a_etc1, b_etc1, a_etc2, b_etc2)? Thank you!

CodePudding user response：

Here is one way to solve your problem:

dd[order(gsub(". _", "", names(dd)))]

# or

dd %>%
  select(order(gsub(". _", "", names(.))))


  a_aa b_aa a_ab b_ab a_ac b_ac
1    2    3    3    1    3    2
2    3    1    4    3    1    1

CodePudding user response：

You may do something like this

library(tidyverse)

d <- "a_aa  a_ab    a_ac    b_aa    b_ab    b_ac
2   3   1   3   3   2
3   1   3   4   1   1"
dd <- read.table(textConnection(object = d), header = T)

colnames(dd) %>% 
  str_split("_") %>% 
  map_chr(~.x[2]) %>% 
  unique() -> vars

dd %>% 
  select(ends_with(all_of(vars)))
#>   a_aa b_aa a_ab b_ab a_ac b_ac
#> 1    2    3    3    3    1    2
#> 2    3    4    1    1    3    1

If you don't want to use other tidyverse libraries than dplyr, you can do

library(dplyr)
#> 
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union

d <- "a_aa  a_ab    a_ac    b_aa    b_ab    b_ac
2   3   1   3   3   2
3   1   3   4   1   1"
dd <- read.table(textConnection(object = d), header = T)

colnames(dd) %>% 
  strsplit("_") %>% 
  sapply(\(.x) .x[2]) %>% 
  unique() -> vars

dd %>% 
  select(ends_with(all_of(vars)))
#>   a_aa b_aa a_ab b_ab a_ac b_ac
#> 1    2    3    3    3    1    2
#> 2    3    4    1    1    3    1

^{Created on 2022-07-10 by the reprex package (v2.0.1)}