I am an R User that is trying to learn more about Python.
I found this Python library that I would like to use for address parsing: https://github.com/zehengl/ez-address-parser
I was able to try an example over here:
from ez_address_parser import AddressParser
ap = AddressParser()
result = ap.parse("290 Bremner Blvd, Toronto, ON M5V 3L9")
print(results)
[('290', 'StreetNumber'), ('Bremner', 'StreetName'), ('Blvd', 'StreetType'), ('Toronto', 'Municipality'), ('ON', 'Province'), ('M5V', 'PostalCode'), ('3L9', 'PostalCode')]
I have the following file that I imported:
df = pd.read_csv(r'C:/Users/me/OneDrive/Documents/my_file.csv', encoding='latin-1')
name address
1 name1 290 Bremner Blvd, Toronto, ON M5V 3L9
2 name2 291 Bremner Blvd, Toronto, ON M5V 3L9
3 name3 292 Bremner Blvd, Toronto, ON M5V 3L9
I tried to apply the above function and export the file:
df['Address_Parse'] = df['ADDRESS'].apply(ap.parse)
df = pd.DataFrame(df)
df.to_csv(r'C:/Users/me/OneDrive/Documents/python_file.csv', index=False, header=True)
This seems to have worked - but everything appears to be in one line!
[('290', 'StreetNumber'), ('Bremner', 'StreetName'), ('Blvd', 'StreetType'), ('Toronto', 'Municipality'), ('ON', 'Province'), ('M5V', 'PostalCode'), ('3L9', 'PostalCode')]
Is there a way in Python to make each of these "elements" (e.g. StreetNumber, StreetName, etc.) into a separate column?
Thank you!
CodePudding user response:
Define a custom function that returns a Series
and join
the output:
def parse(x):
return pd.Series({k:v for v,k in ap.parse(x)})
out = df.join(df['ADDRESS'].apply(parse))
print(out)
CodePudding user response:
If you use pd.DataFrame.apply
, Then you don't have to remember to change it into a series!
But rather can use axis=1
and result_type='expand'
Given:
# df
name address
0 name1 290 Bremner Blvd, Toronto, ON M5V 3L9
Doing:
def parse_address(row):
return {k:v for v,k in ap.parse(row.address)}
df = df.join(df.apply(parse_address, axis=1, result_type='expand'))
# OR Something like this would also work:
def parse_address(row):
return [x[0] for x in ap.parse(row.address)]
new_cols = [
'StreetNumber',
'StreetName',
'StreetType',
'Municipality',
'Province',
'PostalCode',
'PostalCode'
]
df[new_cols] = df.apply(parse_address, axis=1, result_type='expand')
Outputs:
# Method 1
name address Municipality PostalCode Province StreetName StreetNumber StreetType
0 name1 290 Bremner Blvd, Toronto, ON M5V 3L9 Toronto 3L9 ON Bremner 290 Blvd
# Method 2
name address StreetNumber StreetName StreetType Municipality Province PostalCode
0 name1 290 Bremner Blvd, Toronto, ON M5V 3L9 290 Bremner Blvd Toronto ON 3L9
As for dictionary comprehension:
# This:
out = {k:v for v,k in [('a', 'b')]}
# Is like writing this:
out = {}
for v, k in [('a', 'b')]:
out[k] = v
# Both result in:
{'b': 'a'}