Home > Mobile >  filter dataframe with multiple conditions name matching in R dplyr
filter dataframe with multiple conditions name matching in R dplyr

Time:12-22

type sex eth  a_t a_tm b_tm c_tm d_tm e_tm 
1    m   a    0      0    0    1    1    0
0    f   b    1      1    0    0    1    1
0    m   a    0      0    0    1    1    1
1    f   a    1      1    1    1    0    0
0    f   c    1      0    0    1    0    1

How can I select columns using dplyr where the column ends with _tm or the column is in a list containing sex or eth?

expected output

sex eth  a_tm b_tm c_tm d_tm e_tm 
m   a    0    0    1    1    0
f   b    1    0    0    1    1
m   a    0    0    1    1    1
f   a    1    1    1    0    0
f   c    0    0    1    0    1

I want to do this in dplyr without using grepl... is this possible?

CodePudding user response:

It can be done using select-helpers - ends_with in select

library(dplyr)
df1 %>% 
    select(sex, eth, ends_with('_tm'))

-output

   sex eth a_tm b_tm c_tm d_tm e_tm
1   m   a    0    0    1    1    0
2   f   b    1    0    0    1    1
3   m   a    0    0    1    1    1
4   f   a    1    1    1    0    0
5   f   c    0    0    1    0    1

Other options include matches("_tm$") in place of ends_with i.e. if we use regex, it can be done all in matches - matches("^(sex|eth)$|_tm$") where the we use the pattern to match the 'sex' or (|) 'eth' from the start (^) till the end ($) of the string or (|) the substring '_tm' at the end ($) of the string in column names

df1 %>%
    select(matches("^(sex|eth)$|_tm$"))
  sex eth a_tm b_tm c_tm d_tm e_tm
1   m   a    0    0    1    1    0
2   f   b    1    0    0    1    1
3   m   a    0    0    1    1    1
4   f   a    1    1    1    0    0
5   f   c    0    0    1    0    1

data

df1 <- structure(list(type = c(1L, 0L, 0L, 1L, 0L), sex = c("m", "f", 
"m", "f", "f"), eth = c("a", "b", "a", "a", "c"), a_t = c(0L, 
1L, 0L, 1L, 1L), a_tm = c(0L, 1L, 0L, 1L, 0L), b_tm = c(0L, 0L, 
0L, 1L, 0L), c_tm = c(1L, 0L, 1L, 1L, 1L), d_tm = c(1L, 1L, 1L, 
0L, 0L), e_tm = c(0L, 1L, 1L, 0L, 1L)), class = "data.frame", 
row.names = c(NA, 
-5L))
  • Related