Home > database >  Use dplyr::select's where with base R grepl and anonymus function
Use dplyr::select's where with base R grepl and anonymus function

Time:05-15

There is a very similar question here: How to select columns based on grep in dplyr::tibble

However I think that the select_if was superseeded with select(where()).

I know that I can do the following and it works:

# select all columns with three characters
mtcars %>% 
  select(
    matches("^[a-zA-Z]{3}$")
  )

But I can also use an anonymus function (here over all the column values and not the names) to select columns.

mtcars %>% 
  select(
    where(function(x)sum(is.na(x)) == 0)
  )

So I thought I could use an anonymus function and grepl to select columns. And this does not work:

mtcars %>% 
  select(
    where(
      function(x) grepl("^[a-zA-Z]{3}$", x)
    )
  )

How could I make this work? I mean I could always use the matches helper. But I would just like to understand how to use

select(where()) statement over the names of the dataframe and not over all the values in a column.

Update

This works:

mtcars %>% 
  select(
      which(grepl("^[a-zA-Z]{3}$", names(.)))
  )

But I am not sure if there isn't a better way;)

CodePudding user response:

You could just use grep() with select() without the where() function.

mtcars %>% 
  select(grep("^[a-zA-Z]{3}$", names(.)))

Your initial attempt didn't work because in this code:

mtcars %>% 
  select(
    where(
      function(x) grepl("^[a-zA-Z]{3}$", x)
    )
  )

the x in the where() function are the values of the variable and not the name of the variable. That's why it works if you did something like where(is.numeric) works - because it is substituting the actual values.

CodePudding user response:

Base R option:

mtcars[grepl("^[a-zA-Z]{3}$", names(mtcars))]

Output:

                     mpg cyl
Mazda RX4           21.0   6
Mazda RX4 Wag       21.0   6
Datsun 710          22.8   4
Hornet 4 Drive      21.4   6
Hornet Sportabout   18.7   8
Valiant             18.1   6
Duster 360          14.3   8
Merc 240D           24.4   4
Merc 230            22.8   4
Merc 280            19.2   6
Merc 280C           17.8   6
Merc 450SE          16.4   8
Merc 450SL          17.3   8
Merc 450SLC         15.2   8
Cadillac Fleetwood  10.4   8
Lincoln Continental 10.4   8
Chrysler Imperial   14.7   8
Fiat 128            32.4   4
Honda Civic         30.4   4
Toyota Corolla      33.9   4
Toyota Corona       21.5   4
Dodge Challenger    15.5   8
AMC Javelin         15.2   8
Camaro Z28          13.3   8
Pontiac Firebird    19.2   8
Fiat X1-9           27.3   4
Porsche 914-2       26.0   4
Lotus Europa        30.4   4
Ford Pantera L      15.8   8
Ferrari Dino        19.7   6
Maserati Bora       15.0   8
Volvo 142E          21.4   4

CodePudding user response:

We can use select_if to select columns with grepl and is.na

library(dplyr)
mtcars %>%
    select_if(~ grepl("^[a-zA-Z]{3}$",.x))
  • Related