I am trying to bring data from a dataframe which is mapping table into another dataframe using the following, however I get an error 'x' is not defined, what am I doing wrong pls?
Note for values not in the mapping table (China/CN) I would just like the value to be blank or nan. If there are values in the mapping table that are not in my data - I don't want to include them.
import pandas as pd
languages = {'Language': ["English", "German", "French", "Spanish"],
'countryCode': ["EN", "DE", "FR", "ES"]
}
countries = {'Country': ["Australia", "Argentina", "Mexico", "Algeria", "China"],
'countryCode': ["EN", "ES", "ES", "FR", "CN"]
}
language_map = pd.DataFrame(languages)
data = pd.DataFrame(countries)
def language_converter(x):
return language_map.query(f"countryCode=='{x}'")['Language'].values[0]
data['Language'] = data['countryCode'].apply(language_converter(x))
CodePudding user response:
data.merge(language_map, how='left')
Output:
Country countryCode Language
0 Australia EN English
1 Argentina ES Spanish
2 Mexico ES Spanish
3 Algeria FR French
4 China CN NaN
CodePudding user response:
.apply
accepts a callable object, but you've passed language_converter(x)
which is already a function call with undefined x
variable as apply
is not applied yet.
A valid usage is: .apply(language_converter)
.
But next, you'll have another error IndexError: index 0 is out of bounds for axis 0 with size 0
as some country codes may not be found (which breaks the indexing .values[0]
).
If proceeding with your starting approach a valid version would look as below:
def language_converter(x):
lang = language_map[language_map["countryCode"] == x]['Language'].values
return lang[0] if lang.size > 0 else np.nan
data['Language'] = data['countryCode'].apply(language_converter)
print(data)
Country countryCode Language
0 Australia EN English
1 Argentina ES Spanish
2 Mexico ES Spanish
3 Algeria FR French
4 China CN NaN
But, instead of defining and applying language_converter
it's much simpler and straightforward to map country codes explicitly with just:
data['Language'] = data['countryCode'].map(language_map.set_index("countryCode")['Language'])