I have a pandas dataframe column province
which contains USA states and Canada province
names in both uppercase and lowercase
df.province
Output
MANITOBA
Alberta
CA
OH
A dictionary to map all Canada province values
ca_provinces_dic = {
'Newfoundland and Labrador': 'NL',
'Prince Edward Island': 'PE',
'Nova Scotia': 'NS',
'New Brunswick': 'NB',
'Quebec': 'QC',
'Ontario': 'ON',
'Manitoba': 'MB',
'Saskatchewan': 'SK',
'Alberta': 'AB',
'British Columbia': 'BC',
'Yukon': 'YT',
'Northwest Territories': 'NT',
'Nunavut': 'NU',
}
How to map all the province
values with their respective short form using pandas that handles the case matching
issue as well?
For example -
I want to compare all the province column values
(in lowercase) with all the dictionary keys
(in lowercase) and based on the match I will apply the appropriate short form for the province column values (only where applicable)
My current attempt to this problem.
canada_provinces_dic = {
'Newfoundland and Labrador': 'NL',
'Prince Edward Island': 'PE',
'Nova Scotia': 'NS',
'New Brunswick': 'NB',
'Quebec': 'QC',
'Ontario': 'ON',
'Manitoba': 'MB',
'Saskatchewan': 'SK',
'Alberta': 'AB',
'British Columbia': 'BC',
'Yukon': 'YT',
'Northwest Territories': 'NT',
'Nunavut': 'NU',
}
def handle_state(data_attr):
for k, v in canada_provinces_dic.items():
if data_attr.strip().lower() == k.lower():
return canada_provinces_dic[k]
else:
return data_attr
df['state'] = df.state.apply(handle_state)
CodePudding user response:
Use Series.map
and Series.fillna
:
First convert your dict to keys in lowercase:
In [638]: ca_provinces_dic = {k.lower():v for k,v in ca_provinces_dic.items()}
In [659]: df['province'] = df['province'].str.lower().map(ca_province_dict).fillna(df['province'])
In [660]: df
Out[660]:
province
0 MB
1 AB
2 CA
3 OH