Home > Blockchain >  Python fuzzy match
Python fuzzy match

Time:09-15

I have 2 columns in google sheets, I am trying to all values in A that partially match values in Column B

I tried =VLOOKUP(G5&"*",F5:F10,1,FALSE)

also this

=IFERROR(VLOOKUP(A2:A&"*",B2:B,1,FALSE),VLOOKUP(B2:B&"*",A2:A,1,FALSE))

But not getting correct values

Tried with Python (colab)

import pandas as pd
data=pd.read_csv('/delta.csv')
Df=pd.DataFrame(data[['Current','Combined']])
pip install fuzzywuzzy
from fuzzywuzzy import fuzz
from fuzzywuzzy import process

df3=df2.apply(lambda x:difflib.get_close_matches(x,df1))

getting error TypeError: object of type 'float' has no len()

ColA ColB result
alpha.roadsgs.eval alpha.roadsgs EXISTS
alpha.roadsgs.eval.90943.highways EXISTS
alpha.roadsgs.eval.5492.nonhighways EXISTS
alpha.roadsgs.eval.23232.highways EXISTS
alpha.roadsgs.eval.004545.nonhighways EXISTS
alpha.roadsgs.eval.005324.nonhighways EXISTS

Adding more data

ColA ColB result
aaabc.eval.moc abcde EXISTS
abcde.eval abc.123 EXISTS
def.gcd.xyz def.gc EXISTS
abc.123.moc xyz123.eval.moc.facebook.google EXISTS
xyz123.eval.moc google.facebook.apple.chromebook EXISTS
google.facebook.apple

CodePudding user response:

try:

=ARRAYFORMULA(REGEXMATCH(A1:A6, B1))

enter image description here


update:

=ARRAYFORMULA(REGEXMATCH(A1:A12, TEXTJOIN("|", 1, B:B)))

enter image description here

CodePudding user response:

Use this formula

=ARRAYFORMULA({ 
 "Result"; 
 IF(IF(A2:A="",,
 REGEXMATCH(A2:A, 
 SUBSTITUTE(REGEXREPLACE(QUERY(
 QUERY({B2:B} ,
 " Where Col1 is not null ")&"|", "", 9^9),
 ".\z", "")," ", "")))<>TRUE,,"EXISTS")})

enter image description here

  • Related