Home > database >  pivot a pandas dataframe
pivot a pandas dataframe

Time:08-24

I have a dataframe

df = pd.DataFrame({'date':['2021-06','2021-06','2021-09','2021-08','2021-09'],'type':['t1','t1','t1','t2','t2'], 'other_col':['a','b','b','a','c']})

and would like to pivot it such that I get the following output.

date        2021-06  2021-08  2021-09
t1    count 2          0         1
      mean  100%       0%        50%
t2    count 0          1         1
      mean  0%         100%      50%

But I could not find out how to do it.

CodePudding user response:

You can use a list of aggfunc then stack the column header

count = (df.pivot_table(index='type', columns='date', values='other_col',
                        fill_value=0 ,aggfunc=['count'])
         .stack(level=0))

mean = (count.div(count.sum()).mul(100)
        .applymap("{:,.2f}%".format)
        .rename(index={'count': 'mean'}))

out = pd.concat([count, mean]).sort_index()
print(count)

date        2021-06  2021-08  2021-09
type
t1   count        2        0        1
t2   count        0        1        1

print(mean)

date       2021-06  2021-08 2021-09
type
t1   mean  100.00%    0.00%  50.00%
t2   mean    0.00%  100.00%  50.00%


print(out)

date        2021-06  2021-08 2021-09
type
t1   count        2        0       1
     mean   100.00%    0.00%  50.00%
t2   count        0        1       1
     mean     0.00%  100.00%  50.00%

CodePudding user response:

I don't there's a single aggfunc that'll do this, but you could do:

df1 = df.pivot_table(index='type', columns='date', values='other_col', aggfunc='count').fillna(0)
df1.index = pd.MultiIndex.from_arrays([df1.index, ['count']*len(df1)])
df2 = (df1 / df1.sum(axis=0) * 100).astype(str) '%'
df2.index = pd.MultiIndex.from_arrays([df2.index.get_level_values(0), ['mean']*len(df2)])
pd.concat([df1, df2]).sort_index()

which'll give

date       2021-06 2021-08 2021-09
type                              
t1   count     2.0     0.0     1.0
     mean   100.0%    0.0%   50.0%
t2   count     0.0     1.0     1.0
     mean     0.0%  100.0%   50.0%

CodePudding user response:

Suppose you forgot to include values:

df['values'] = [1,2,3,4,5]

So:

pt = pd.pivot_table(df,aggfunc=[np.mean, 'count'], values='values',columns='date').T

pt.index = pt.index.swaplevel(0,1)
  • Related