I have these kind of countries in the dataframe. There are some with full country names, there are some with alpha-2.
Country
------------------------
8836 United Kingdom
1303 ES
7688 United Kingdom
12367 FR
7884 United Kingdom
6844 United Kingdom
3706 United Kingdom
3567 UK
6238 FR
588 UK
4901 United Kingdom
568 UK
4880 United Kingdom
11284 France
1273 Spain
2719 France
1386 UK
12838 United Kingdom
868 France
1608 UK
Name: Country, dtype: object
Note: Some data in Country are empty.
How will I be able to create a new column with the alpha-2 country codes in it?
Country | Country Code
---------------------------------------
United Kingdom | UK
France | FR
FR | FR
UK | UK
Italy | IT
Spain | ES
ES | ES
...
CodePudding user response:
You can try this, as already mentioned in the comment by me earlier.
import pandas as pd
df = pd.DataFrame([[1, 'UK'],[2, 'United Kingdom'],[3, 'ES'],[2, 'Spain']], columns=['id', 'Country'])
#Create copy of country column as alpha-2
df['alpha-2'] = df['Country']
#Create a look up with required values
lookup_table = {'United Kingdom':'UK', 'Spain':'ES'}
#replace the alpha-2 column with lookup values.
df = df.replace({'alpha-2':lookup_table})
print(df)
Output
CodePudding user response:
You will have to define a dictionary for the replacements (or find a library that does it for you). The abbreviations look pretty close the IBAN codes to me. But the biggest stickout was United Kingdom => GB
as opposed to UK
in your example.
I would start with the IBAN codes and define a big dictionary like this:
mappings = {
"Afghanistan": "AF",
"Albania": "AL",
...
}
df["Country Code"] = df["Country"].replace(mappings)