I have the following Pandas DF:
ID Country
----------
01 "it"
02 "es"
03 "de"
04 "ch"
05 "in"
06 "ca"
where I want to replace the 2-letter country codes to the appropriate continent name like this:
ID Country
----------
01 "europe"
02 "europe"
03 "europe"
04 "asia"
05 "asia"
06 "america"
I have collected a dict with keys as continent name and values as list of country codes belonging the respective continents:
> country_dict
{'europe': ['it', 'es', 'de', 'gb'],
'asia': ['in', 'ch', 'ru'],
'america': ['us', 'ca']}
The best I could do so far:
for continent in country_dict.keys():
df.Country.replace(country_dict[continent], continent)
but this seems somewhat less elegant. Any better idea?
CodePudding user response:
Your dict is backwards.
>>> import pandas as pd
>>> df = pd.DataFrame(['it', 'es'], columns=['Country'])
>>> df
Country
0 it
1 es
>>> country_dict = {'europe': ['it', 'es', 'de', 'gb'],
'asia': ['in', 'ch', 'ru'],
'america': ['us', 'ca']}
>>> country_dict = {v: k for k, vs in country_dict.items() for v in vs}
>>> country_dict
{'it': 'europe', 'es': 'europe', 'de': 'europe', 'gb': 'europe', 'in': 'asia', 'ch': 'asia', 'ru': 'asia', 'us': 'america', 'ca': 'america'}
>>> df.replace(country_dict)
Country
0 europe
1 europe
CodePudding user response:
This can be tricky that changing country_dict
base value and key then using pandas.Series.map
:
>>> dct = {v:k for k,val in country_dict.items() for v in val}
>>> df['Country'] = df['Country'].map(dct)
>>> df
Country
0 europe
1 europe
2 europe
3 asia
4 asia
5 america