Full country name to country code in Dataframe-CodePudding

I have these kind of countries in the dataframe. There are some with full country names, there are some with alpha-2.

         Country
------------------------
8836     United Kingdom
1303                 ES
7688     United Kingdom
12367                FR
7884     United Kingdom
6844     United Kingdom
3706     United Kingdom
3567                 UK
6238                 FR
588                  UK
4901     United Kingdom
568                  UK
4880     United Kingdom
11284            France
1273              Spain
2719             France
1386                 UK
12838    United Kingdom
868              France
1608                 UK
Name: Country, dtype: object

Note: Some data in Country are empty.

How will I be able to create a new column with the alpha-2 country codes in it?

Country          | Country Code
---------------------------------------
United Kingdom   | UK
France           | FR
FR               | FR
UK               | UK
Italy            | IT
Spain            | ES
ES               | ES
...

CodePudding user response：

You can try this, as already mentioned in the comment by me earlier.

import pandas as pd
df = pd.DataFrame([[1, 'UK'],[2, 'United Kingdom'],[3, 'ES'],[2, 'Spain']], columns=['id', 'Country'])

#Create copy of country column as alpha-2
df['alpha-2'] = df['Country']

#Create a look up with required values
lookup_table = {'United Kingdom':'UK', 'Spain':'ES'}

#replace the alpha-2 column with lookup values.
df = df.replace({'alpha-2':lookup_table})
print(df)

Output

CodePudding user response：

You will have to define a dictionary for the replacements (or find a library that does it for you). The abbreviations look pretty close the IBAN codes to me. But the biggest stickout was United Kingdom => GB as opposed to UK in your example.

I would start with the IBAN codes and define a big dictionary like this:

mappings = {
    "Afghanistan": "AF",
    "Albania": "AL",
    ...
}

df["Country Code"] = df["Country"].replace(mappings)