my df looks like this:
segment group purchase_amount
A control 2601
A variant1 2608
A variant2 2586
B control 2441
B variant1 2712
B variant2 2710
I would like a table where WITHIN segments, I can compare all variants with the respective control. Something like this:
segment %lift purchase_amount
A variant1 (2608-2601)/2601
A variant2 (2586-2601)/2601
B variant1 (2712-2441)/2441
B variant2 (2710-2441)/2441
How can I do this with pandas?
CodePudding user response:
You can use .groupby
to apply the function.
def apply_control(group):
control = group.loc[group.group == 'control', 'purchase_amount'].values[0]
return (group.purchase_amount - control) / control
result = df.assign(purchase_amount=(
df
.groupby('segment')
.apply(apply_control)
.reset_index(level=0, drop=True)
)).loc[df.group != 'control']
This assumes that there is one row with the value 'control' in each group. The rest of the code resets the index in order to assign it back to the dataframe, overwriting the existing purchase_amount
column and then filtering out the control rows.
output
segment group purchase_amount
1 A variant1 0.002691
2 A variant2 -0.005767
4 B variant1 0.111020
5 B variant2 0.110201
CodePudding user response:
You can do:
m = df['group'].eq('control')
out = df[~m].copy()
controls = out['segment'].map(df[m].set_index('segment')['purchase_amount'])
out['purchase_amount'] = (out['purchase_amount'] - controls)/(controls)
print(out)
segment group purchase_amount
1 A variant1 0.002691
2 A variant2 -0.005767
4 B variant1 0.111020
5 B variant2 0.110201