How to create %lift from pandas dataframe comparing several variants to control group?-CodePudding

my df looks like this:

segment    group        purchase_amount 
A          control      2601    
A          variant1     2608    
A          variant2     2586    
B          control      2441    
B          variant1     2712    
B          variant2     2710

I would like a table where WITHIN segments, I can compare all variants with the respective control. Something like this:

segment    %lift        purchase_amount 
A          variant1     (2608-2601)/2601
A          variant2     (2586-2601)/2601    
B          variant1     (2712-2441)/2441    
B          variant2     (2710-2441)/2441

How can I do this with pandas?

CodePudding user response：

You can use .groupby to apply the function.

def apply_control(group):
    control = group.loc[group.group == 'control', 'purchase_amount'].values[0]
    return (group.purchase_amount - control) / control

result = df.assign(purchase_amount=(
    df
    .groupby('segment')
    .apply(apply_control)
    .reset_index(level=0, drop=True)
)).loc[df.group != 'control']

This assumes that there is one row with the value 'control' in each group. The rest of the code resets the index in order to assign it back to the dataframe, overwriting the existing purchase_amount column and then filtering out the control rows.

output

  segment     group  purchase_amount
1       A  variant1         0.002691
2       A  variant2        -0.005767
4       B  variant1         0.111020
5       B  variant2         0.110201

CodePudding user response：

You can do:

m = df['group'].eq('control')
out = df[~m].copy()
controls = out['segment'].map(df[m].set_index('segment')['purchase_amount'])
out['purchase_amount'] = (out['purchase_amount'] - controls)/(controls)

print(out)

  segment     group  purchase_amount
1       A  variant1         0.002691
2       A  variant2        -0.005767
4       B  variant1         0.111020
5       B  variant2         0.110201