Home > Software engineering >  Is there an alternative to map in pandas to determine a custom order of binning values?
Is there an alternative to map in pandas to determine a custom order of binning values?

Time:12-15

Suppose we have the following pandas dataframe:

df 
    Age   Country
0    10     1
1    15     2
2    20     3
3    25     1
4    30     2
5    15     3
6    20     3
7    15     4
8    20     4

The desired result is:

    Age   Country  Continent
0    10     1       Africa
1    15     2        Asia
2    20     3       Africa
3    25     1       Africa
4    30     2        Asia
5    15     3       Africa
6    20     3       Africa
7    15     4        Asia
8    20     4        Asia

How can we group df if the number of countries is huge? One possible solution is given here:

continent_lookup = {1: 'Africa', 2: 'Asia', 3: 'Africa', 4: 'Asia'}
df['Continent'] = df.Country.map(continent_lookup)

However, if the number of countries is very large, it would be convenient to be able to pass a dictionary of the following form:

convenient_continent_lookup = {'Africa': [1,3], 'Asia': [2,4]}

One way to go from here would be to convert convenient_continent_lookup to a dataframe as mentioned in this answer to get df2:

df2
   Country  Continent
0     1       Africa
1     2        Asia
2     3       Africa
3     1       Africa
4     2        Asia
5     3       Africa
6     3       Africa
7     4        Asia
8     4        Asia

If we would now join df with df2 on column 'Continent' we get the desired result. Is there an easier solution?

CodePudding user response:

You don't need to create a new dataframe at all. Using list comphrension, you can convert the convenient_continent_lookup dict into a dict that map can understand:

df['Continent'] = df['Country'].map({v: k for k, vals in convenient_continent_lookup.items() for v in vals})

Output:

>>> df
   Age  Country Continent
0   10        1    Africa
1   15        2      Asia
2   20        3    Africa
3   25        1    Africa
4   30        2      Asia
5   15        3    Africa
6   20        3    Africa
7   15        4      Asia
8   20        4      Asia

CodePudding user response:

Let us check explode merge

s = pd.Series(convenient_continent_lookup).explode().reset_index()
s.columns = ['Continent','Country']
df = df.merge(s,how='left')
df
Out[359]: 
   Age Country Continent
0   10       1    Africa
1   25       1    Africa
2   15       2      Asia
3   30       2      Asia
4   20       3    Africa
5   15       3    Africa
6   20       3    Africa
7   15       4      Asia
8   20       4      Asia
  • Related