How to Calculate Dropoff by Unique Field in Pandas DataFrame with Duplicates-CodePudding

import numpy as np
import pandas as pd
df = pd.DataFrame({
  'user' : ['A', 'A', 'B', 'B', 'B', 'C', 'C'],
  'step_1' : [True, True, True, True, True, True, True],
  'step_2' : [True, False, False, True, False, True, True],
  'step_3' : [False, False, False, False, False, True, True]
})
print(df)

  user  step_1  step_2  step_3
0    A    True    True   False
1    A    True   False   False
2    B    True   False   False
3    B    True    True   False
4    B    True   False   False
5    C    True    True    True
6    C    True    True    True

I would like to run the calculation to see what fraction of users get to each step. I have multiple observations of some users, and the order cannot be counted on to simply do a df.drop_duplicates( subset = ['user'] ).

In this case, the answer should be:

Step 1 = 1.00 (because A, B, and C all have a True in Step 1)
Step 2 = 1.00 (A, B, C)
Step 3 = 0.33 (C)

(I do not need to worry about any edge case in which a user goes from False in one step to True in a subsequent step within the same row.)

CodePudding user response：

In your case you can do

df.groupby('user').any().mean()
Out[11]: 
step_1    1.000000
step_2    1.000000
step_3    0.333333
dtype: float64