Trying to filter the data of the first,middle and last string word like matching using grepl function, but it is also picking words like HEV along with DEV(intended match)
Airport_ID<-c("3001","3002","3003","3004")
Airport_Name<-c("DEV Adelaide DTSUpdated","HEV Brisbane HEV Land Airport Land ADTS",
"DEVAST Washington INC Airport DTSUpdated","DALLAS DEVASTAirport HEV INCUpdated")
dfu<-data.frame(Airport_ID,Airport_Name)
Filter_Data_F <- dfu %>%
dplyr::filter(grepl("^DEV" , Airport_Name , fixed = F) |
grepl(" \\DEV\\ " , Airport_Name , fixed = F) |
grepl("DEV$" , Airport_Name , fixed = F) )
CodePudding user response:
\\D
has a special meaning in regex. It matches any character that is not a digit character. So in the second condition it is matching a non-digit character (H
) followed by EV
, hence you get HEV
in the output.
Secondly, grepl
has by default fixed = FALSE
so you can ignore that argument.
Also, I am not sure if you should write separate grepl
arguments with |
. Only one grepl
should do it.
library(dplyr)
dfu %>% dplyr::filter(grepl('DEV', Airport_Name))
# Airport_ID Airport_Name
#1 3001 DEV Adelaide DTSUpdated
#2 3003 DEVAST Washington INC Airport DTSUpdated
#3 3004 DALLAS DEVASTAirport HEV INCUpdated
If you want to exactly match DEV
so DEVAST
does not match, use word boundaries (\\b
).
dfu %>% dplyr::filter(grepl('\\bDEV\\b', Airport_Name))
# Airport_ID Airport_Name
#1 3001 DEV Adelaide DTSUpdated