Home > Enterprise >  Pandas bringing in data from another dataframe
Pandas bringing in data from another dataframe

Time:02-04

I am trying to bring data from a dataframe which is mapping table into another dataframe using the following, however I get an error 'x' is not defined, what am I doing wrong pls?

Note for values not in the mapping table (China/CN) I would just like the value to be blank or nan. If there are values in the mapping table that are not in my data - I don't want to include them.

import pandas as pd

languages = {'Language': ["English", "German", "French", "Spanish"],
            'countryCode': ["EN", "DE", "FR", "ES"]
            }

countries = {'Country': ["Australia", "Argentina", "Mexico", "Algeria", "China"],
             'countryCode': ["EN", "ES", "ES", "FR", "CN"]
            }

language_map = pd.DataFrame(languages)
data = pd.DataFrame(countries)

def language_converter(x):
    return language_map.query(f"countryCode=='{x}'")['Language'].values[0]

data['Language'] = data['countryCode'].apply(language_converter(x))

CodePudding user response:

Use pandas.DataFrame.merge:

data.merge(language_map, how='left')

Output:

     Country countryCode Language
0  Australia          EN  English
1  Argentina          ES  Spanish
2     Mexico          ES  Spanish
3    Algeria          FR   French
4      China          CN      NaN

CodePudding user response:

.apply accepts a callable object, but you've passed language_converter(x) which is already a function call with undefined x variable as apply is not applied yet.

A valid usage is: .apply(language_converter).
But next, you'll have another error IndexError: index 0 is out of bounds for axis 0 with size 0 as some country codes may not be found (which breaks the indexing .values[0]).

If proceeding with your starting approach a valid version would look as below:

def language_converter(x):
    lang = language_map[language_map["countryCode"] == x]['Language'].values
    return lang[0] if lang.size > 0 else np.nan

data['Language'] = data['countryCode'].apply(language_converter)
print(data)

     Country countryCode Language
0  Australia          EN  English
1  Argentina          ES  Spanish
2     Mexico          ES  Spanish
3    Algeria          FR   French
4      China          CN      NaN

But, instead of defining and applying language_converter it's much simpler and straightforward to map country codes explicitly with just:

data['Language'] = data['countryCode'].map(language_map.set_index("countryCode")['Language'])
  • Related