Home > front end >  Aggergate dataframe using dictionary mapping for pivot table
Aggergate dataframe using dictionary mapping for pivot table

Time:09-16

I would like to efficiently transform a dataframe df of the below structure

country sector production
US automotive 100
US aviation 50
CA automotive 30
CA aviation 15
JP automotive 95
JP aviation 25

using the mapping (avilable as dictionary mapping_dict)

region countries
region_1 US, CA
region_2 US, JP

into the aggregated pivot-table of the form

region_1 region_2
automotive 115 195
aviation 1200 1400

without using loops. Before using pivot, I tried to aggregate by using the dictionary and map, but this won't work due to countries belonging to multiple regions.

df['country'] = df['country'].map(mapping_dict)

MWE input data:

df = pd.DataFrame(
    {
        "country": ['US', 'US', 'CA', 'CA', 'JP', 'JP'],
        "sector": ['automotive', 'aviation', 'automotive', 'aviation', 'automotive', 'aviation'],
        "production": [100, 50, 30, 15, 95, 25]
    }
)
dict_country_per_region = {'region_1': 'US, CA', 'region_2': 'US, JP'}

CodePudding user response:

You can create a new dataframe for the region-country relation, then merge, and groupby().sum().unstack() or pivot_table:

# if you don't already have the dataframe
regions = pd.DataFrame(list(dict_country_per_region.items()), columns=['region','country'])

# split the country strings, then explode
regions = regions.assign(country=regions['country'].str.split(', ')).explode('country')

(df.merge(regions, on='country')
   .groupby(['region','sector'])['production'].sum()
   .unstack('region', fill_value=0)
)

Output:

region      region_1  region_2
sector                        
automotive       130       195
aviation          65        75

CodePudding user response:

#create a dataframe from the dictionary
df2=pd.DataFrame.from_dict(dict_country_per_region, orient='index', columns=['country'])

#split the list values into rows for easy mapping
df2=df2['country'].str.replace(r'\s','',regex=True).str.split(',').apply(list).explode().reset_index()
df2

#merge
df=df.merge(df2,
        on='country',
        how='left')

df.pivot_table(index='sector', columns='index', values='production', aggfunc='sum').reset_index()


index   sector     region_1     region_2
0       automotive      130        195
1       aviation         65         75
  • Related