Due to the poor regex knowledge, I don't know how to select specific columns in r using regex.
There is a short example. I have a dataframe df
that have lots of variables.
a = c('1.age41_50', '2.age51_60', '3.age61_70', '4.age71_80',
'5.age1_20', '6.age21_30', '7.age31_40', '8.ageupwith65', '9.agelo65', '10.PM2_5')
df = matrix(ncol = 10, nrow = 1) %>% as_tibble()
colnames(df) = a
df
I want to select specific variables using select()
and matches()
from dplyr
package.
Regex should follow the following conditions:
variable names should not contain
age
and_
in the meantime.
In my view, I first search variable names that contain age
and _
in the meantime and then reverse select it but failed. Such as this:
df %>% select(!matches('age&_'))
The final result should like this:
df_expected = df %>% select(`8.ageupwith65`, `9.agelo65`, `10.PM2_5`)
Any help will be highly appreciated!
CodePudding user response:
You may use
> df %>% select(!matches('age[0-9] _'))
# A tibble: 1 x 3
`8.ageupwith65` `9.agelo65` `10.PM2_5`
<lgl> <lgl> <lgl>
1 NA NA NA
This expression matches age
, one or more digits, and then an underscore. The final result is reversed due to the !
operator.