Home > front end >  In R, find full or partial matches of a string in one column with a second column
In R, find full or partial matches of a string in one column with a second column

Time:06-23

I need to compare an 8 to 10-character string in a column with another 8 to 10-character string in a second column, and add an asterisk to the first string if at least the first 8 digits match. I can manage an exact match, but I don't know how to manage the partial matches.

Can someone please help? I have the below code:

tl<-c("10012908","1001290810","10111090")
trqs<-as.data.frame(tl)

tl<-c("10012908","10012910")
mfn<-as.data.frame(tl)

for(i in 1:nrow(trqs)){
if(trqs$tl[i] %in% mfn$tl){
  trqs$tl[i] <-paste0(trqs$tl[i],"*")
  }
}

#the result should be:
trqs$tl<-c("10012908*","1001290810*","10111090")

CodePudding user response:

tl<-c("10012908","1001290810","10111090")
trqs<-as.data.frame(tl)
trqs$tl1<-c("10012908","1001290810","1090")
trqs[grep(paste(trqs$tl,collapse="|"),trqs$tl1),"tl1"]<-paste0(trqs[grep(paste(trqs$tl,collapse="|"),trqs$tl1),"tl1"],"*")
trqs

CodePudding user response:

Here is another approach using substr to match the first 8 characters.

First, create a vector mfn_match that will contain the first 8 characters from your mfn data.frame column of interest. Then you only need to do this once.

Next, create a logical vector trqs_match for those where the first 8 characters of tl in trqs match at least one element of mfn_match. For those matches, add an asterisk.

mfn_match <- substr(mfn$tl, 1, 8)
trqs_match <- sapply(trqs$tl, function(x) substr(x, 1, 8) %in% mfn_match)
trqs$tl[trqs_match] <- paste0(trqs$tl[trqs_match], "*")
trqs

Output

           tl
1   10012908*
2 1001290810*
3    10111090
  •  Tags:  
  • r
  • Related