selecting columns based on exact string-CodePudding

df1 <- data.frame(x1_modhigh_2020 = 1,
                  x2_modhigh_2030 = 1,
                  x1_low_2020 = 1,
                  x2_low_2030 = 1,
                  x1_high_2020 = 1,
                  x2_high_2030 = 1)

In a for-loop I want to select columns based on whether they contain 'low', 'modhigh' or 'high' and do some operations on them. My method of selecting columns is:

  library(dplyr)  
  df1 %>% dplyr::select(contains("low")) # this works    
  df1 %>% dplyr::select(contains("modhigh")) # this works    
  df1 %>% dplyr::select(contains("high")) # does not work. This also select `modhigh`

How can I modify the selection of high so that modhigh does not get selected as well

CodePudding user response：

Using matches you can use regex syntax (rather than contains, which does not allow the use of regex), here for example the pipe |, which is a regex metacharacter signifying alternation:

df1 %>%
      select(matches("_high|low"))
  x1_low_2020 x2_low_2030 x1_high_2020 x2_high_2030
1           1           1            1            1

CodePudding user response：

I would also use the matches selection helper proposed by @Chris, but if you are interested in alternatives:

# dplyr
dplyr::select(df1, grep("_high|low", colnames(df1)))

# base R
df1[, grep("_high|low", colnames(df1))]

Both result in

 x1_low_2020 x2_low_2030 x1_high_2020 x2_high_2030
       1           1            1            1