I am looking for an easy, concise way to use dplyr::select
without rearranging columns.
Consider this dataset:
library(tidyverse)
head(msleep)
#> # A tibble: 6 × 11
#> name genus vore order conservation sleep_total sleep_rem sleep_cycle awake
#> <chr> <chr> <chr> <chr> <chr> <dbl> <dbl> <dbl> <dbl>
#> 1 Cheetah Acin… carni Carn… lc 12.1 NA NA 11.9
#> 2 Owl mo… Aotus omni Prim… <NA> 17 1.8 NA 7
#> 3 Mounta… Aplo… herbi Rode… nt 14.4 2.4 NA 9.6
#> 4 Greate… Blar… omni Sori… lc 14.9 2.3 0.133 9.1
#> 5 Cow Bos herbi Arti… domesticated 4 0.7 0.667 20
#> 6 Three-… Brad… herbi Pilo… <NA> 14.4 2.2 0.767 9.6
#> # … with 2 more variables: brainwt <dbl>, bodywt <dbl>
If I select vore
, genus
and name
, the resulting dataframe is arranged in the order in which the columns were provided.
msleep %>% select(vore, genus, name)
#> # A tibble: 83 × 3
#> vore genus name
#> <chr> <chr> <chr>
#> 1 carni Acinonyx Cheetah
#> 2 omni Aotus Owl monkey
#> 3 herbi Aplodontia Mountain beaver
#> 4 omni Blarina Greater short-tailed shrew
#> 5 herbi Bos Cow
#> 6 herbi Bradypus Three-toed sloth
#> 7 carni Callorhinus Northern fur seal
#> 8 <NA> Calomys Vesper mouse
#> 9 carni Canis Dog
#> 10 herbi Capreolus Roe deer
#> # … with 73 more rows
I would instead like to leave them in their default order: name
, genus
, then vore
.
I have a solution (see below), but I do not like it because it is quite wordy, and not completely “tidyverse-esque”. (I am teaching an intro to tidyverse course, and would like something that would not intimidate beginners.)
msleep %>%
select(all_of(names(msleep)[names(msleep) %in% c("vore", "genus", "name")]))
#> # A tibble: 83 × 3
#> name genus vore
#> <chr> <chr> <chr>
#> 1 Cheetah Acinonyx carni
#> 2 Owl monkey Aotus omni
#> 3 Mountain beaver Aplodontia herbi
#> 4 Greater short-tailed shrew Blarina omni
#> 5 Cow Bos herbi
#> 6 Three-toed sloth Bradypus herbi
#> 7 Northern fur seal Callorhinus carni
#> 8 Vesper mouse Calomys <NA>
#> 9 Dog Canis carni
#> 10 Roe deer Capreolus herbi
#> # … with 73 more rows
Is there such a thing? Thank you!
For context: In reality, we have a data frame with about 400 columns, from which we are selecting ~10-20 at a time to work with. The order of the columns in the original data frame is meaningful, but we don't want to have to labor over listing them in their correct order in the select statements. A very specific need, I'll admit.
Created on 2021-12-22 by the reprex package (v2.0.1)
CodePudding user response:
We could use match
with sort
library(dplyr)
msleep %>%
select(sort(match(c("vore", "genus", "name"), names(.))))
EDIT: Based on the OP's comments
CodePudding user response:
Update: In case of providing a vector we could do as akrun suggests in the comments:
nm1 <- c("vore", "genus", "name"); pattern <- str_c(nm1, collapse="|")
Original answer:
You could first define a string with the search items
and then use matches
pattern <- c("vore|genus|name")
select(msleep, matches(pattern))
name genus vore
<chr> <chr> <chr>
1 Cheetah Acinonyx carni
2 Owl monkey Aotus omni
3 Mountain beaver Aplodontia herbi
4 Greater short-tailed shrew Blarina omni
5 Cow Bos herbi
6 Three-toed sloth Bradypus herbi
7 Northern fur seal Callorhinus carni
8 Vesper mouse Calomys NA
9 Dog Canis carni
10 Roe deer Capreolus herbi
CodePudding user response:
You can use the power of eval_select()
to create a function to select and sort the columns.
library(dplyr)
select_in_order <- function(data, ...) {
ordered_cols <- sort(tidyselect::eval_select(expr(c(...)), data))
select(data, ordered_cols)
}
So now this will do what you are asking. The benefit is that it will be "full feature" to what you are used to being able to enter into a select()
statement.
# library(ggplot2) # msleep is in ggplot2
msleep %>%
select_in_order(vore, genus, name)
# this will work as well
msleep %>%
select_in_order(starts_with("sleep"), vore, name:genus)