Home > other >  selecting columns based on exact string
selecting columns based on exact string

Time:09-10

df1 <- data.frame(x1_modhigh_2020 = 1,
                  x2_modhigh_2030 = 1,
                  x1_low_2020 = 1,
                  x2_low_2030 = 1,
                  x1_high_2020 = 1,
                  x2_high_2030 = 1)

In a for-loop I want to select columns based on whether they contain 'low', 'modhigh' or 'high' and do some operations on them. My method of selecting columns is:

  library(dplyr)  
  df1 %>% dplyr::select(contains("low")) # this works    
  df1 %>% dplyr::select(contains("modhigh")) # this works    
  df1 %>% dplyr::select(contains("high")) # does not work. This also select `modhigh`
  

How can I modify the selection of high so that modhigh does not get selected as well

CodePudding user response:

Using matches you can use regex syntax (rather than contains, which does not allow the use of regex), here for example the pipe |, which is a regex metacharacter signifying alternation:

df1 %>%
      select(matches("_high|low"))
  x1_low_2020 x2_low_2030 x1_high_2020 x2_high_2030
1           1           1            1            1

CodePudding user response:

I would also use the matches selection helper proposed by @Chris, but if you are interested in alternatives:

# dplyr
dplyr::select(df1, grep("_high|low", colnames(df1)))

# base R
df1[, grep("_high|low", colnames(df1))]

Both result in

 x1_low_2020 x2_low_2030 x1_high_2020 x2_high_2030
       1           1            1            1
  • Related