I have a data.table that I want to determine if a set of codes is present in a specific character column. I pass the pattern to %like%
as a vector of values as illustrated. This syntax works for me; however, I would like to force the %like%
function to treat each element of the pattern vector as literal, i.e. not use the .
as a regex wildcard. The manual of data.table says that for the like
function, it can be set as fixed = TRUE
. Is there a way I can force my code, using %like%
, to treat the .
and ..
as literal rather than wildcards? Thx. J
This works but treats "." incorrectly as a wildcard:
Codes <- c("65E..|9OX..|9OX1.|9OX2.|9OX3.|9OXZ.|Xaa9G")
ActualCodes <- dt[code_id %like% Codes]
This does not:
Codes <- c("65E..|9OX..|9OX1.|9OX2.|9OX3.|9OXZ.|Xaa9G")
ActualCodes <- dt[code_id %like% Codes, fixed = TRUE]
CodePudding user response:
If you look at the help page for ?'%like%'
you should see that there are two form of like
. The one that you are attempting to use is the infix, tw0-argument version and does not accept a fixed =TRUE
argument. I did try to use the "un-ampersanded" version but failed. What did work was to side-step the "fixed" strategy and instead use a character-class approach to getting "exactly-periods":
DT = data.table(Name=c("65E..","65EXX","Xaa9G"), Salary=c(2,3,4))
DT
#---------------
Name Salary
1: 65E.. 2
2: 65EXX 3
3: Xaa9G 4
DT[Name %like% "^Mar"] # the example was copied from the help page
#Empty data.table (0 rows and 2 cols): Name,Salary
Codes <- c("65E..|9OX..|9OX1.|9OX2.|9OX3.|9OXZ.|Xaa9G")
DT[ Name %like% Codes]
# Name Salary
#1: 65E.. 2
#2: 65EXX 3 # WRONG, try again
#3: Xaa9G 4
Codes <- gsub("[.]", "[.]", Codes, fixed = TRUE) #doesn't succeed
Codes
#[1] "65E..|9OX..|9OX1.|9OX2.|9OX3.|9OXZ.|Xaa9G" # wrong result, no matches
Codes <- gsub("[.]", "[.]", Codes) # remove "fixed", character class succeeds
Codes
#[1] "65E[.][.]|9OX[.][.]|9OX1[.]|9OX2[.]|9OX3[.]|9OXZ[.]|Xaa9G"
DT[ Name %like% Codes]
# --- correct result----
Name Salary
1: 65E.. 2
2: Xaa9G 4 # SUCCESS