Suppose we have the following pandas dataframe:
df
Age Country
0 10 1
1 15 2
2 20 3
3 25 1
4 30 2
5 15 3
6 20 3
7 15 4
8 20 4
The desired result is:
Age Country Continent
0 10 1 Africa
1 15 2 Asia
2 20 3 Africa
3 25 1 Africa
4 30 2 Asia
5 15 3 Africa
6 20 3 Africa
7 15 4 Asia
8 20 4 Asia
How can we group df
if the number of countries is huge? One possible solution is given here:
continent_lookup = {1: 'Africa', 2: 'Asia', 3: 'Africa', 4: 'Asia'}
df['Continent'] = df.Country.map(continent_lookup)
However, if the number of countries is very large, it would be convenient to be able to pass a dictionary of the following form:
convenient_continent_lookup = {'Africa': [1,3], 'Asia': [2,4]}
One way to go from here would be to convert convenient_continent_lookup
to a dataframe as mentioned in this answer to get df2
:
df2
Country Continent
0 1 Africa
1 2 Asia
2 3 Africa
3 1 Africa
4 2 Asia
5 3 Africa
6 3 Africa
7 4 Asia
8 4 Asia
If we would now join df
with df2
on column 'Continent' we get the desired result. Is there an easier solution?
CodePudding user response:
You don't need to create a new dataframe at all. Using list comphrension, you can convert the convenient_continent_lookup
dict into a dict that map
can understand:
df['Continent'] = df['Country'].map({v: k for k, vals in convenient_continent_lookup.items() for v in vals})
Output:
>>> df
Age Country Continent
0 10 1 Africa
1 15 2 Asia
2 20 3 Africa
3 25 1 Africa
4 30 2 Asia
5 15 3 Africa
6 20 3 Africa
7 15 4 Asia
8 20 4 Asia
CodePudding user response:
Let us check explode
merge
s = pd.Series(convenient_continent_lookup).explode().reset_index()
s.columns = ['Continent','Country']
df = df.merge(s,how='left')
df
Out[359]:
Age Country Continent
0 10 1 Africa
1 25 1 Africa
2 15 2 Asia
3 30 2 Asia
4 20 3 Africa
5 15 3 Africa
6 20 3 Africa
7 15 4 Asia
8 20 4 Asia