Home > database >  apply defined function to column pandas and fuzzywuzzy
apply defined function to column pandas and fuzzywuzzy

Time:11-13

I am using the fuzzywuzzy library to match strings in a reference list using Levenshtein Distance. I want to apply this function to a series, matching each value of the series to a value in a reference list, if the value of the series matches the value in the reference list at a defined ratio, it either returns the value in the series (original) or the value in the reference list.

The function looks like this:

from fuzzywuzzy import fuzz

ref_list = ['SOBEYS', 'WHOLE FOODS', 'FOODLAND', 'LOBLAWS', 'SAFEWAY']

def clean(row, ref_list):
    for ref in ref_list:
        simil = fuzz.ratio(row, ref)
        if (simil > 35):
            return ref
        elif (simil < 25):
            return row

I created this test dataframe and it works fine. But I get the TypeError: object of type 'float' has no len() when I apply it to the whole dataset.

I can't figure out why it works in the sample dataset I created and not in the whole (original) dataset.

Any help is appreciated. Thank you in advance!

lis = ['FOODLAND',
 'THORNBURY FOODLAND',
 'JOANNE S PLACE NO WED DEL',
 'SOBEYS',
 'SOBEYS',
 'SOBEYS',
 'SOBEYS',
 'SOBEYS',
 'SOBEYS TIMBERLEA',
 'SOBEYS']


data = pd.DataFrame(lis, columns=['retailer'])

data['match'] = data['retailer'].apply(lambda x: clean(x, ref_list))

enter image description here

CodePudding user response:

The error seems pretty self explanatory. Here's a way to reproduce it:

# sample data
f = pd.DataFrame({'col': ['SOBEYS ABC', 2.0]})
f['col'].apply(lambda x: fuzz.ratio(x, 'ABC'))

     43 @functools.wraps(func)
     44 def decorator(*args, **kwargs):
---> 45     if len(args[0]) == 0 or len(args[1]) == 0:
     46         return 0
     47     return func(*args, **kwargs)

TypeError: object of type 'float' has no len()

Basically, your column as a float value. A way to fix it is by converting it to str:

data['match'] = data['retailer'].astype(str).apply(lambda x: clean(x, ref_list))
  • Related