Home > Blockchain >  Apply fuzzy ratio to two dataframes
Apply fuzzy ratio to two dataframes

Time:10-09

I have two dataframes where I want to fuzzy string compare & apply my function to two dataframes:

sample1 = pd.DataFrame(data1.sample(n=200, random_state=42))
sample2 = pd.DataFrame(data2.sample(n=200, random_state=13))

def get_ratio(row):
    sample1 = row['address']
    sample2 = row['address']
    return fuzz.token_set_ratio(sample1, sample2)
match = data[data.apply(get_ratio, axis=1) >= 78] #I want to apply get_ratio to both sample1 and sample2
no_matched = data[data.apply(get_ratio, axis=1) <= 77] #I want to apply get_ratio to both sample1 and sample2

Thanks in advance for your help!

CodePudding user response:

You need to create the permutations of your addresses. Then use that to compare the matching ones. You can find a similar question here.

For your case first you need to create permutations:

combs = list(itertools.product(data1["address"], data2["address"]))
combs = pd.DataFrame(combs)

Then use the proper method for matching:

combs['score'] = combs.apply(lambda x: fuzz.token_set_ratio(x[0],x[1]), axis=1)

now based on the score you can find the ones that have matched or have not matched.

I advise you do try to group and clean the addresses first (i.e., lowering the case, removing the duplicates) Otherwise it might take a very long time to compute.

  • Related