I have a dataframe
df = pd.DataFrame({'date':['2021-06','2021-06','2021-09','2021-08','2021-09'],'type':['t1','t1','t1','t2','t2'], 'other_col':['a','b','b','a','c']})
and would like to pivot it such that I get the following output.
date 2021-06 2021-08 2021-09
t1 count 2 0 1
mean 100% 0% 50%
t2 count 0 1 1
mean 0% 100% 50%
But I could not find out how to do it.
CodePudding user response:
You can use a list of aggfunc
then stack the column header
count = (df.pivot_table(index='type', columns='date', values='other_col',
fill_value=0 ,aggfunc=['count'])
.stack(level=0))
mean = (count.div(count.sum()).mul(100)
.applymap("{:,.2f}%".format)
.rename(index={'count': 'mean'}))
out = pd.concat([count, mean]).sort_index()
print(count)
date 2021-06 2021-08 2021-09
type
t1 count 2 0 1
t2 count 0 1 1
print(mean)
date 2021-06 2021-08 2021-09
type
t1 mean 100.00% 0.00% 50.00%
t2 mean 0.00% 100.00% 50.00%
print(out)
date 2021-06 2021-08 2021-09
type
t1 count 2 0 1
mean 100.00% 0.00% 50.00%
t2 count 0 1 1
mean 0.00% 100.00% 50.00%
CodePudding user response:
I don't there's a single aggfunc
that'll do this, but you could do:
df1 = df.pivot_table(index='type', columns='date', values='other_col', aggfunc='count').fillna(0)
df1.index = pd.MultiIndex.from_arrays([df1.index, ['count']*len(df1)])
df2 = (df1 / df1.sum(axis=0) * 100).astype(str) '%'
df2.index = pd.MultiIndex.from_arrays([df2.index.get_level_values(0), ['mean']*len(df2)])
pd.concat([df1, df2]).sort_index()
which'll give
date 2021-06 2021-08 2021-09
type
t1 count 2.0 0.0 1.0
mean 100.0% 0.0% 50.0%
t2 count 0.0 1.0 1.0
mean 0.0% 100.0% 50.0%
CodePudding user response:
Suppose you forgot to include values:
df['values'] = [1,2,3,4,5]
So:
pt = pd.pivot_table(df,aggfunc=[np.mean, 'count'], values='values',columns='date').T
pt.index = pt.index.swaplevel(0,1)