Python Error: 'float' object has no attribute 'replace'-CodePudding

I am an R User that is trying to learn more about Python.

I found this Python library that I would like to use for address parsing: https://github.com/zehengl/ez-address-parser

I was able to try an example over here:

from ez_address_parser import AddressParser

ap = AddressParser()

result = ap.parse("290 Bremner Blvd, Toronto, ON M5V 3L9")
print(results)
[('290', 'StreetNumber'), ('Bremner', 'StreetName'), ('Blvd', 'StreetType'), ('Toronto', 'Municipality'), ('ON', 'Province'), ('M5V', 'PostalCode'), ('3L9', 'PostalCode')]

I have the following file that I imported:

df = pd.read_csv(r'C:/Users/me/OneDrive/Documents/my_file.csv',  encoding='latin-1')

   name                               address
1 name1 290 Bremner Blvd, Toronto, ON M5V 3L9
2 name2 291 Bremner Blvd, Toronto, ON M5V 3L9
3 name3 292 Bremner Blvd, Toronto, ON M5V 3L9

I then applied the above function and export the file and everything works:

df['Address_Parse'] = df['ADDRESS'].apply(ap.parse)

df = pd.DataFrame(df)
df.to_csv(r'C:/Users/me/OneDrive/Documents/python_file.csv', index=False, header=True)

Problem: I now have another file (similar format) - but this time, I am getting an error:

df1 = pd.read_csv(r'C:/Users/me/OneDrive/Documents/my_file1.csv',  encoding='latin-1')
df1['Address_Parse'] = df1['ADDRESS'].apply(ap.parse)

AttributeError: 'float' object has no attribute 'replace'

I am confused as to why the same code will not work for this file. As I am still learning Python, I am not sure where to begin to debug this problem. My guesses are that perhaps there are special characters in the second file, formatting issues or incorrect variable types that are preventing this ap.parse function from working, but I am still not sure.

Can someone please show me what to do?

Thank you!

CodePudding user response：

Looking at the code from the library, we have this method for parse in the AddressParser class, and then this function for tokenize that is called by parse

# method of AddressParser
def parse(self, address):
        if not self.crf:
            raise RuntimeError("Model is not loaded")

        tokens = tokenize(address)
        labels = self.crf.predict([transform(address)])[0]
        return list(zip(tokens, labels))

def tokenize(s):
    s = s.replace("#", " # ")
    return [token for token in split(fr"[{puncts}\s] ", s) if token]

We can see here that tokenize calls replace, and so that is likely where your error is coming from. tokenize is probably expecting a str here (not a float), and that s.replace() is almost certainly for a string replacement.

So, your column likely has floats in it when it expects strings. The tokenize function should probably handle that better, but now it is up to you.

You should be able to resolve this by forcing your Address column to be strings (pandas will call it 'object').

df1['string_address'] = df1['ADDRESS'].astype(str)
df1['Address_Parse'] = df1['string_address'].apply(ap.parse)

CodePudding user response：

You can try read the csv file all in string by adding the dtype=str

df1 = pd.read_csv(r'C:/Users/me/OneDrive/Documents/my_file1.csv', encoding='latin-1', dtype=str)