Home > other >  Partial String Match based on a list
Partial String Match based on a list

Time:04-12

I want to partial string match through an entire list. Then to create a data frame with both showing the Proper name next to the name from the abbreviated name.

I'm sure this is easy but I haven't been able to find it yet.

For example:


library(data.table)


list_abbreviated = c("KF Chicken", "CHI Wendys", "CAL InandOut")

list_proper = c("Kentucky Fried Chicken", "Chicago Wendys", "California InandOut", "Ontario Whataburger")

# I've tried

Pattern = paste(list_proper, collapse="|")

DT_result = data.table(list_abbreviated, result=grepl(Pattern, list_abbreviated ))
DT_result

# This is the result

   list_abbreviated result
1:       KF Chicken  FALSE
2:       CHI Wendys  FALSE
3:     CAL InandOut  FALSE

# I tried other options using %like% to no avail either. 

# This is the output I  am looking for

  list_abbreviated result            list_proper
1       KF Chicken   TRUE Kentucky Fried Chicken
2       CHI Wendys   TRUE         Chicago Wendys
3     CAL InandOut   TRUE    California InandOut

CodePudding user response:

One option would be to create a subset of the last name to do a partial join on. So, we can use regex_inner_join from fuzzyjoin to do a partial join to merge the two data tables together.

library(stringi)
library(fuzzyjoin)

list_abbreviated = data.table(list_abbreviated = c("KF Chicken", "CHI Wendys", "CAL InandOut"))
list_abbreviated[, limited:= stri_extract_last_words(list_abbreviated)]

list_proper = data.table(list_proper = c("Kentucky Fried Chicken", "Chicago Wendys", "California InandOut", "Ontario Whataburger"))

DT_result <- data.table(regex_inner_join(list_proper, list_abbreviated, by = c("list_proper" = "limited")))
DT_result[,limited:=NULL]

Output

              list_proper list_abbreviated
1: Kentucky Fried Chicken       KF Chicken
2:         Chicago Wendys       CHI Wendys
3:    California InandOut     CAL InandOut
  • Related