import numpy as np
import pandas as pd
data = {'experiment_name': ['exp1', 'exp1', 'exp1', 'exp1', 'exp1', 'exp1'],
'variant': ['A', 'B', 'A','B','A','B'],'sessions_with_orders':[1,2,6,0,23,12],
'total_sessions':[10,23,56,22,89,12]}
Create DataFrame
df = pd.DataFrame(data)
final_pivot=(df.pivot_table(index='variant',columns='experiment_name',values=['total_sessions','sessions_with_orders'],aggfunc=np.sum)
.assign(ratio=lambda d: d['sessions_with_orders']/d['total_sessions']) )
final_pivot_reset=final_pivot.reset_index()
##this makes an accessible data frame using .iloc
type(final_pivot_reset)
pandas.core.frame.DataFrame
I don't think final_pivot.reset_index()
is doing correct thing?
Further, I have challenge in accessing column/names from the reset output. When I look at the column names they are hierarchal (default output from pivot_table
).
final_pivot_reset.columns
MultiIndex([( 'variant', ''),
('sessions_with_orders', 'exp1'),
( 'total_sessions', 'exp1'),
( 'ratio', '')],
names=[None, 'experiment_name'])
For example, I would like to access final_pivot_reset.ratio
how do I achieve it?
I had created a post on how to create custom column on the fly for pivot_table pandas - pivot || create custom column for ratio/percentage
Thank you.
CodePudding user response:
You can flatten MultiIndex
after pivot_table
with separator _
, so change columns names for ratio
. Last if need convert index
to column variant
add DataFrame.reset_index
:
final_pivot=df.pivot_table(index='variant',
columns='experiment_name',
values=['total_sessions','sessions_with_orders'],
aggfunc=np.sum)
final_pivot.columns = [f'{a}_{b}' for a, b in final_pivot.columns]
final_pivot = (final_pivot.assign(ratio=lambda d: d['sessions_with_orders_exp1']/d['total_sessions_exp1'])
.reset_index())
print (final_pivot)
variant sessions_with_orders_exp1 total_sessions_exp1 ratio
0 A 30 155 0.193548
1 B 14 57 0.245614