I want to partial string match through an entire list. Then to create a data frame with both showing the Proper name next to the name from the abbreviated name.
I'm sure this is easy but I haven't been able to find it yet.
For example:
library(data.table)
list_abbreviated = c("KF Chicken", "CHI Wendys", "CAL InandOut")
list_proper = c("Kentucky Fried Chicken", "Chicago Wendys", "California InandOut", "Ontario Whataburger")
# I've tried
Pattern = paste(list_proper, collapse="|")
DT_result = data.table(list_abbreviated, result=grepl(Pattern, list_abbreviated ))
DT_result
# This is the result
list_abbreviated result
1: KF Chicken FALSE
2: CHI Wendys FALSE
3: CAL InandOut FALSE
# I tried other options using %like% to no avail either.
# This is the output I am looking for
list_abbreviated result list_proper
1 KF Chicken TRUE Kentucky Fried Chicken
2 CHI Wendys TRUE Chicago Wendys
3 CAL InandOut TRUE California InandOut
CodePudding user response:
One option would be to create a subset of the last name to do a partial join on. So, we can use regex_inner_join
from fuzzyjoin
to do a partial join to merge the two data tables together.
library(stringi)
library(fuzzyjoin)
list_abbreviated = data.table(list_abbreviated = c("KF Chicken", "CHI Wendys", "CAL InandOut"))
list_abbreviated[, limited:= stri_extract_last_words(list_abbreviated)]
list_proper = data.table(list_proper = c("Kentucky Fried Chicken", "Chicago Wendys", "California InandOut", "Ontario Whataburger"))
DT_result <- data.table(regex_inner_join(list_proper, list_abbreviated, by = c("list_proper" = "limited")))
DT_result[,limited:=NULL]
Output
list_proper list_abbreviated
1: Kentucky Fried Chicken KF Chicken
2: Chicago Wendys CHI Wendys
3: California InandOut CAL InandOut