I would like to efficiently transform a dataframe df
of the below structure
country | sector | production |
---|---|---|
US | automotive | 100 |
US | aviation | 50 |
CA | automotive | 30 |
CA | aviation | 15 |
JP | automotive | 95 |
JP | aviation | 25 |
using the mapping (avilable as dictionary mapping_dict
)
region | countries |
---|---|
region_1 | US, CA |
region_2 | US, JP |
into the aggregated pivot-table of the form
region_1 | region_2 | |
---|---|---|
automotive | 115 | 195 |
aviation | 1200 | 1400 |
without using loops. Before using pivot
, I tried to aggregate by using the dictionary and map
, but this won't work due to countries belonging to multiple regions.
df['country'] = df['country'].map(mapping_dict)
MWE input data:
df = pd.DataFrame(
{
"country": ['US', 'US', 'CA', 'CA', 'JP', 'JP'],
"sector": ['automotive', 'aviation', 'automotive', 'aviation', 'automotive', 'aviation'],
"production": [100, 50, 30, 15, 95, 25]
}
)
dict_country_per_region = {'region_1': 'US, CA', 'region_2': 'US, JP'}
CodePudding user response:
You can create a new dataframe for the region-country relation, then merge, and groupby().sum().unstack()
or pivot_table
:
# if you don't already have the dataframe
regions = pd.DataFrame(list(dict_country_per_region.items()), columns=['region','country'])
# split the country strings, then explode
regions = regions.assign(country=regions['country'].str.split(', ')).explode('country')
(df.merge(regions, on='country')
.groupby(['region','sector'])['production'].sum()
.unstack('region', fill_value=0)
)
Output:
region region_1 region_2
sector
automotive 130 195
aviation 65 75
CodePudding user response:
#create a dataframe from the dictionary
df2=pd.DataFrame.from_dict(dict_country_per_region, orient='index', columns=['country'])
#split the list values into rows for easy mapping
df2=df2['country'].str.replace(r'\s','',regex=True).str.split(',').apply(list).explode().reset_index()
df2
#merge
df=df.merge(df2,
on='country',
how='left')
df.pivot_table(index='sector', columns='index', values='production', aggfunc='sum').reset_index()
index sector region_1 region_2
0 automotive 130 195
1 aviation 65 75