Home > Software engineering >  How to select specific columns in r using regex
How to select specific columns in r using regex

Time:11-18

Due to the poor regex knowledge, I don't know how to select specific columns in r using regex.

There is a short example. I have a dataframe df that have lots of variables.

a = c('1.age41_50', '2.age51_60', '3.age61_70', '4.age71_80',
      '5.age1_20', '6.age21_30', '7.age31_40', '8.ageupwith65', '9.agelo65', '10.PM2_5')

df = matrix(ncol = 10, nrow = 1) %>% as_tibble()
colnames(df) = a
df

I want to select specific variables using select() and matches() from dplyr package. Regex should follow the following conditions:

variable names should not contain age and _ in the meantime.

In my view, I first search variable names that contain age and _ in the meantime and then reverse select it but failed. Such as this:

df %>% select(!matches('age&_')) 

The final result should like this:

df_expected = df %>% select(`8.ageupwith65`, `9.agelo65`, `10.PM2_5`)

Any help will be highly appreciated!

CodePudding user response:

You may use

> df %>% select(!matches('age[0-9] _')) 
# A tibble: 1 x 3
  `8.ageupwith65` `9.agelo65` `10.PM2_5`
  <lgl>           <lgl>       <lgl>     
1 NA              NA          NA        

This expression matches age, one or more digits, and then an underscore. The final result is reversed due to the ! operator.

  • Related