I have this huge dataset in which I have to replace each country's name with the corresponding ISO code. I have stored the ISO code of each country into another df. e.g.
df1
:
TERRITORY |
---|
France, Italy |
Italy |
Spain, France |
France, Spain, Italy |
df2
:
COUNTRY | ISO CODE |
---|---|
France | FR |
Italy | IT |
Spain | ES |
Expected output:
TERRITORY |
---|
FR, IT |
IT |
ES, FR |
FR, ES, IT |
My last try was to convert the element into tuples and then replace, but it doesn't work (and I don't know if it makes sense, but here's my code anyway):
country = tuple(list(df2['COUNTRY']))
iso = tuple(list(df2['ISO CODE']))
z = zip(x, y)
for x, y in z:
if x in df1['TERRITORY']:
df1['TERRITORY'].str.replace(x, y)
But it doesn't change anything.
CodePudding user response:
You can use a combination of .str.split
.explode
, then .replace
.set_index
, and finally .groupby(level=0)
agg(list)
.str.join
:
df1['TERRITORY'] = df1['TERRITORY'].str.split(', ').explode().replace(df2.set_index('COUNTRY')['ISO CODE']).groupby(level=0).agg(list).str.join(', ')
Output:
>>> df1
TERRITORY
0 FR, IT
1 IT
2 ES, FR
3 FR, ES, IT
CodePudding user response:
If you have all the mapping you can also just use replace()
:
df1.replace(df2.set_index('COUNTRY').squeeze(), regex=True)
Output:
Territory
0 FR, IT
1 IT
2 ES, FR
3 FR, ES, IT