Home > Blockchain >  Can I force R data.table %like% to use "fixed = TRUE"?
Can I force R data.table %like% to use "fixed = TRUE"?

Time:11-30

I have a data.table that I want to determine if a set of codes is present in a specific character column. I pass the pattern to %like% as a vector of values as illustrated. This syntax works for me; however, I would like to force the %like% function to treat each element of the pattern vector as literal, i.e. not use the . as a regex wildcard. The manual of data.table says that for the like function, it can be set as fixed = TRUE. Is there a way I can force my code, using %like%, to treat the . and .. as literal rather than wildcards? Thx. J

This works but treats "." incorrectly as a wildcard:

Codes <- c("65E..|9OX..|9OX1.|9OX2.|9OX3.|9OXZ.|Xaa9G")

ActualCodes <- dt[code_id %like% Codes] 

This does not:

Codes <- c("65E..|9OX..|9OX1.|9OX2.|9OX3.|9OXZ.|Xaa9G")

ActualCodes <- dt[code_id %like% Codes, fixed = TRUE] 

CodePudding user response:

If you look at the help page for ?'%like%' you should see that there are two form of like. The one that you are attempting to use is the infix, tw0-argument version and does not accept a fixed =TRUE argument. I did try to use the "un-ampersanded" version but failed. What did work was to side-step the "fixed" strategy and instead use a character-class approach to getting "exactly-periods":

DT = data.table(Name=c("65E..","65EXX","Xaa9G"), Salary=c(2,3,4))
DT
#---------------
    Name Salary
1: 65E..      2
2: 65EXX      3
3: Xaa9G      4


 
 DT[Name %like% "^Mar"]  # the example was copied from the help page
#Empty data.table (0 rows and 2 cols): Name,Salary
 Codes <- c("65E..|9OX..|9OX1.|9OX2.|9OX3.|9OXZ.|Xaa9G")
 DT[ Name %like%  Codes]
 #   Name Salary
#1: 65E..      2
#2: 65EXX      3       # WRONG, try again
#3: Xaa9G      4  

 
 Codes <- gsub("[.]", "[.]", Codes, fixed = TRUE)  #doesn't succeed
 Codes
#[1] "65E..|9OX..|9OX1.|9OX2.|9OX3.|9OXZ.|Xaa9G"  # wrong result, no matches

 Codes <- gsub("[.]", "[.]", Codes)   # remove "fixed", character class succeeds
 Codes
#[1] "65E[.][.]|9OX[.][.]|9OX1[.]|9OX2[.]|9OX3[.]|9OXZ[.]|Xaa9G"
 DT[ Name %like%  Codes]
# --- correct result----
    Name Salary
1: 65E..      2
2: Xaa9G      4   # SUCCESS
  • Related