I am trying to use dplyr
to drop columns from a data.frame
where the column name contains a substring anywhere except the start of the name (i.e. any index other than the first).
After looking around (pun intended), it appears that this is usually accomplished by including a lookbehind assertion in the regular expression I pass to dplyr::matches()
within the dplyr::select()
call. I'm not familiar with how these work, but my attempt at implementing this below throws an error.
Am I incorrectly implementing a lookbehind or is this a limitation of the regular expressions I can pass to matches()
? I welcome a working solution.
library(dplyr)
# Example data
df <- data.frame(bar = rnorm(1),
foo1 = rnorm(1),
bar_foo1 = rnorm(1),
bar_foo1_bat = rnorm(1))
# Desired output
df %>% select(bar, foo1)
#> bar foo1
#> 1 1.057651 -0.1526598
# Sucessfully drops columns with "foo1" anywhere
df %>% select(-matches(".*foo1.*"))
#> bar
#> 1 1.057651
# Both fail to drop columns with "foo1" anywhere *except the start of the string*
df %>% select(-matches("(?<!^).*foo1.*"))
#> Warning in grep(needle, haystack, ...): TRE pattern compilation error 'Invalid
#> regexp'
#> Error in `select()`:
#> ! invalid regular expression '(?<!^).*foo1.*', reason 'Invalid regexp'
df %>% select(-matches("(?<!^)foo1.*"))
#> Warning in grep(needle, haystack, ...): TRE pattern compilation error 'Invalid
#> regexp'
#> Error in `select()`:
#> ! invalid regular expression '(?<!^)foo1.*', reason 'Invalid regexp'
Created on 2022-07-14 by the reprex package (v2.0.1)
CodePudding user response:
You need one or more of .
at the beginning so you could write ^.{1,}
.
df %>% dplyr::select(-matches("^.{1,}foo1"))
# bar foo1
# 1 -1.077056 -0.5649875
CodePudding user response:
df%>%select(-matches('^. foo1'))
bar foo1
1 1.806521 -0.9380235