Home > Net >  Different questions about pandas pivot tables
Different questions about pandas pivot tables

Time:11-03

Here's my df:

df=pd.DataFrame(
    {
        'Color': ['red','blue','red','red','green','red','yellow'],
        'Type': ['Oil', 'Aluminium', 'Oil', 'Oil', 'Cement Paint', 'Synthetic Rubber', 'Emulsion'],
        'Finish' : ['Satin', 'Matte', 'Matte', 'Satin', 'Semi-gloss', 'Satin', 'Satin'],
        'Use' : ['Interior', 'Exterior', 'Interior', 'Interior', 'Exterior', 'Exterior', 'Exterior'],
        'Price' : [55, 75, 60, 60, 55, 75, 50]
    }
)

I want to create a pivot table that will output 'Color', 'color count', the percentage or weight or each count of color, and finally a total row, outputting the total color count next to 100%. Additionally, I'd like to add a header with today's date in the following format (02 - Nov).

Here is my current pivot with the aproximating inputs

today=datetime.date.today()
today_format=today.strftime("%d-m%")

pivot_table=pd.pivot_table(
    data=df,
    index='Color',
    aggfunc={'Color':'count'}
)

df['Color'].value_counts(
    normalize=True
).mul(100).round(1).astype(str)   '%'

Is there a way to add more information to the pivot as a header, total and extra column? Or just I just try to convert the pivot back to a DF and edit it from there?

The main difficulty I'm finding is that since I'm handling string data, when I 'aggfunc='sum' it actually adds the strings. And If I try to add 'margins=True, margins_name='Total count' I get the following error:

    if isinstance(aggfunc[k], str):
KeyError: 'Type'

The desired table output would look something like this:

enter image description here

Thanks for your help!

CodePudding user response:

Updated Answer

Thanks to a great suggestion by Rabinzel, we can also have today's date as a column header as well:

df = (df['Color'].value_counts().reset_index().pivot_table(index = ['index'], aggfunc = np.sum, margins=True, margins_name='Total')
 .assign(perc = lambda x: x['Color']/x.iloc[:-1]['Color'].sum() * 100)
 .rename(columns = {'Color' : 'Color Count', 
                    'perc' : '%'}))

new_cols = pd.MultiIndex.from_product([[datetime.today().strftime('%#d-%b')], df.columns])
df.columns = new_cols
df

             2-Nov            
       Color Count           %
index                         
blue             1   14.285714
green            1   14.285714
red              4   57.142857
yellow           1   14.285714
Total            7  100.000000
  • Related