Home > Net >  dplyr::select equivalent for subsetting a character vector
dplyr::select equivalent for subsetting a character vector

Time:11-20

Does anyone know whether there's the equivalent of dplyr::select() for subsetting character vectors? Specifically, what I like about dplyr::select() is the how easy it is to select dataframe columns; one can input the names of the columns (with or without quotes) and similarly create ranges from those names. Is there any function where you could take a character vector, and without using any helper function(s), create a range for the purposes of subsetting.

For example, let's create an aribitrary vector (but assume that this vector was the result of some other operation, like pulling sheet names from an Excel file):

char_vec <- c("Exclude1",
  paste0("Include", 1:5),
  "Exclude2")

Is there a function where a user could input: someFunction(char_vec, Include1:Include5) in order to get the result?

I realize that there are a bunch of regex related solutions as well as good old base R -- char_vec[which(char_vec == "Include1"):which(char_vec == "Include5")] -- but I was hoping for a function that resembled dplyr::select()

CodePudding user response:

A bit hacky, but you could create a function that changes your character vector into dataframe column names, applies tidy selection helpers, then converts back to a character vector:

select_v <- function(x, ...) {
  x_df <- as.data.frame(matrix(ncol = length(x)))
  names(x_df) <- x
  names(dplyr::select(x_df, ...))
}

char_vec |>
  select_v(Include1:Include5)
# "Include1" "Include2" "Include3" "Include4" "Include5"

char_vec |>
  select_v(starts_with("Exc"), ends_with("4"))
# "Exclude1" "Exclude2" "Include4"

Note that, like dplyr::select(), this will throw an error if your selection includes any duplicate values (e.g., if "Include2" appeared twice in the vector).

CodePudding user response:

I'm afraid you have to go to base R. The functions of dplyr are useful for data.frame and tibble classes. But the moment you say your data is an atomic vector, dplyr (or tidyverse) doesn't work. This is because vector has only one column. So, there is no question of selecting columns. If you want to select rows from that column/vector using dplyr, you should convert it into data.frame (using as.data.frame(df)) or tibble (using as_tibble(df) function from tibble package), and then use dplyr's filter verb.

  • Related