Home > OS >  How to change a multiindex into columns in Panda
How to change a multiindex into columns in Panda

Time:08-02

I am trying to aggregate sales data using Pandas. Each line of the input file has a date, sales, category and date where there can be multiple entries for a category for a date.

import pandas as pd
from datetime import date

df = pd.DataFrame( [
    { 'Date': date(2022,4,1), 'Category': 'Food', 'Amount': 11.0 },
    { 'Date': date(2022,4,1), 'Category': 'Soda', 'Amount': 3.0 },
    { 'Date': date(2022,4,1), 'Category': 'Soda', 'Amount': 2.0 },
    { 'Date': date(2022,4,1), 'Category': 'Food', 'Amount': 13.0 },

    { 'Date': date(2022,4,2), 'Category': 'Candy', 'Amount': 1.0 },
    { 'Date': date(2022,4,2), 'Category': 'Candy', 'Amount': 0.5 },
    { 'Date': date(2022,4,2), 'Category': 'Food', 'Amount': 15.0 },
    { 'Date': date(2022,4,2), 'Category': 'Soda', 'Amount': 2.0 },
    { 'Date': date(2022,4,2), 'Category': 'Soda', 'Amount': 1.0 },

    { 'Date': date(2022,4,3), 'Category': 'Candy', 'Amount': 2.0 },
    { 'Date': date(2022,4,3), 'Category': 'Food', 'Amount': 18.0 },
    { 'Date': date(2022,4,3), 'Category': 'Food', 'Amount': 11.0 },
] )

I can use groupby to sum the entries for a category and I end up with a multi-index on Date and Category:

b = df.groupby(['Date', 'Category']).sum()
print(b)
                     Amount
Date       Category        
2022-04-01 Food       24.00
           Soda        5.00
2022-04-02 Candy       1.75
           Food       15.00
           Soda        5.00
2022-04-03 Candy       0.60
           Food       11.00
           Soda        3.00

How can I transform this so the different categories are columns with the date as the index, something like this:

            Food  Soda  Candy
2022-04-01  24.0  5.0   0.0
2022-04-02  15.0  5.0   1.75
2022-04-03  11.0  3.0   0.6

I've tried pivot tables, crosstabs (xs) and unstacking and can't figure out the right Pandas commands to get there!

CodePudding user response:

Using crosstab:

import numpy as np

out = (pd.crosstab(df['Date'], df['Category'], df['Amount'], aggfunc=np.sum)
         .fillna(0)
       )

Output:

Category    Candy  Food  Soda
Date                         
2022-04-01    0.0  24.0   5.0
2022-04-02    1.5  15.0   3.0
2022-04-03    2.0  29.0   0.0

Modification of your method with unstack:

out = (df.groupby(['Date', 'Category'])['Amount'].sum()
         .unstack(fill_value=0)
       )

CodePudding user response:

You can use pd.pivot_table with sum as aggfunc

df.pivot_table(index='Date', columns='Category', values='Amount', aggfunc='sum', fill_value=0)

Output:

Category    Candy  Food  Soda
Date                         
2022-04-01    0.0    24     5
2022-04-02    1.5    15     3
2022-04-03    2.0    29     0

Because you mentioned next to crosstab (mozway's answer) and pivot_table also unstack, here a way you could do it with that:

df.set_index(['Date', 'Category'],append=True).unstack().groupby('Date').sum()
  • Related