Home > Net >  Pandas Pivot Table with multilevel index
Pandas Pivot Table with multilevel index

Time:03-08

I have a df with items and their yearly sales. I would like to change it to pivot table but with the two levels index.

My DF:

date    brand_id    brand_name  art_id  art_name    count_art
2015    1           cat         10      A           120
2016    1           cat         10      A           100
2017    1           cat         12      B           80
2015    2           dog         20      C           100
2016    2           dog         25      D           110
2015    3           bird        30      E           50
2017    3           bird        31      F           90

I want the result to be like this:

                                2015                            2016                            2017            
brand_id    brand_name  art_id  art_name    count_art   art_id  art_name    count_art   art_id  art_name    count_art
1           cat         10      A           120         10      A           100         12      B           80      
2           dog         20      C           100         25      D           110         null    null        null    
3           bird        30      E           50          null    null        null        31      F           90  

For now I have tried following command:

transformed_data = df.pivot_table(values=['art_id', 'art_name', 'count_art'], index=['brand_id', 'brand_name'], columns='date', aggfunc='first')

However it isn't working as expected. I know how to change rows to yearly columns however I don't know how to change multiple columns in multiple rows to one row with more columns.

CodePudding user response:

Add DataFrame.swaplevel with DataFrame.sort_index:

df = (df.pivot_table(values=['art_id', 'art_name', 'count_art'], 
                    index=['brand_id', 'brand_name'], 
                    columns='date', 
                    aggfunc='first')
        .swaplevel(1, 0, axis=1)
        .sort_index(level=0, axis=1, sort_remaining=False))
print (df)
date                  2015                      2016                     \
                    art_id art_name count_art art_id art_name count_art   
brand_id brand_name                                                       
1        cat          10.0        A     120.0   10.0        A     100.0   
2        dog          20.0        C     100.0   25.0        D     110.0   
3        bird         30.0        E      50.0    NaN      NaN       NaN   

date                  2017                     
                    art_id art_name count_art  
brand_id brand_name                            
1        cat          12.0        B      80.0  
2        dog           NaN      NaN       NaN  
3        bird         31.0        F      90.0 

CodePudding user response:

IIUC, use pivot_table command to include the values in the desired order. Then, use swaplevel to reorder your levels, and sort_index with sort_remaining=False to ensure only the dates are sorted:

new_cols = ['art_id', 'art_name', 'count_art']
transformed_data = (
 df.pivot_table(values=new_cols,
               index=['brand_id', 'brand_name'],
               columns=['date'], aggfunc='first')
   [new_cols]
   .swaplevel(axis=1)
   .sort_index(level=0, axis=1, sort_remaining=False)
)

output:

date                  2015                      2016                      2017                   
                    art_id art_name count_art art_id art_name count_art art_id art_name count_art
brand_id brand_name                                                                              
1        cat          10.0        A     120.0   10.0        A     100.0   12.0        B      80.0
2        dog          20.0        C     100.0   25.0        D     110.0    NaN      NaN       NaN
3        bird         30.0        E      50.0    NaN      NaN       NaN   31.0        F      90.0
  • Related