I need to compare an 8 to 10-character string in a column with another 8 to 10-character string in a second column, and add an asterisk to the first string if at least the first 8 digits match. I can manage an exact match, but I don't know how to manage the partial matches.
Can someone please help? I have the below code:
tl<-c("10012908","1001290810","10111090")
trqs<-as.data.frame(tl)
tl<-c("10012908","10012910")
mfn<-as.data.frame(tl)
for(i in 1:nrow(trqs)){
if(trqs$tl[i] %in% mfn$tl){
trqs$tl[i] <-paste0(trqs$tl[i],"*")
}
}
#the result should be:
trqs$tl<-c("10012908*","1001290810*","10111090")
CodePudding user response:
tl<-c("10012908","1001290810","10111090")
trqs<-as.data.frame(tl)
trqs$tl1<-c("10012908","1001290810","1090")
trqs[grep(paste(trqs$tl,collapse="|"),trqs$tl1),"tl1"]<-paste0(trqs[grep(paste(trqs$tl,collapse="|"),trqs$tl1),"tl1"],"*")
trqs
CodePudding user response:
Here is another approach using substr
to match the first 8 characters.
First, create a vector mfn_match
that will contain the first 8 characters from your mfn
data.frame column of interest. Then you only need to do this once.
Next, create a logical vector trqs_match
for those where the first 8 characters of tl
in trqs
match at least one element of mfn_match
. For those matches, add an asterisk.
mfn_match <- substr(mfn$tl, 1, 8)
trqs_match <- sapply(trqs$tl, function(x) substr(x, 1, 8) %in% mfn_match)
trqs$tl[trqs_match] <- paste0(trqs$tl[trqs_match], "*")
trqs
Output
tl
1 10012908*
2 1001290810*
3 10111090