I have the following:
a_aa a_ab a_ac b_aa b_ab b_ac
2 3 3 3 1 2
3 4 1 1 3 1
Desired outcome:
a_aa b_aa a_ab b_ab a_ac b_ac
2 3 3 1 3 2
3 1 4 3 1 1
Code with data:
d <- "a_aa a_ab a_ac b_aa b_ab b_ac
2 3 3 3 1 2
3 4 1 1 3 1"
dd <- read.table(textConnection(object = d), header = T)
My current solution is manual:
dd %>% select(a_aa, b_aa, a_ab, b_ab, a_ac, b_ac)
however, is onerous when number of columns is large. Any ideas how to do this kind of column ordering with grouping (e.g. sequence a_etc1, b_etc1, a_etc2, b_etc2)? Thank you!
CodePudding user response:
Here is one way to solve your problem:
dd[order(gsub(". _", "", names(dd)))]
# or
dd %>%
select(order(gsub(". _", "", names(.))))
a_aa b_aa a_ab b_ab a_ac b_ac
1 2 3 3 1 3 2
2 3 1 4 3 1 1
CodePudding user response:
You may do something like this
library(tidyverse)
d <- "a_aa a_ab a_ac b_aa b_ab b_ac
2 3 1 3 3 2
3 1 3 4 1 1"
dd <- read.table(textConnection(object = d), header = T)
colnames(dd) %>%
str_split("_") %>%
map_chr(~.x[2]) %>%
unique() -> vars
dd %>%
select(ends_with(all_of(vars)))
#> a_aa b_aa a_ab b_ab a_ac b_ac
#> 1 2 3 3 3 1 2
#> 2 3 4 1 1 3 1
If you don't want to use other tidyverse libraries than dplyr, you can do
library(dplyr)
#>
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#>
#> filter, lag
#> The following objects are masked from 'package:base':
#>
#> intersect, setdiff, setequal, union
d <- "a_aa a_ab a_ac b_aa b_ab b_ac
2 3 1 3 3 2
3 1 3 4 1 1"
dd <- read.table(textConnection(object = d), header = T)
colnames(dd) %>%
strsplit("_") %>%
sapply(\(.x) .x[2]) %>%
unique() -> vars
dd %>%
select(ends_with(all_of(vars)))
#> a_aa b_aa a_ab b_ab a_ac b_ac
#> 1 2 3 3 3 1 2
#> 2 3 4 1 1 3 1
Created on 2022-07-10 by the reprex package (v2.0.1)