Add column that keeps count of distinct values grouped by a variable in pandas-CodePudding

I have a pandas dataframe called df that looks like this

name   test_type   block  
joe    0           1                
joe    0           1            
joe    1           2            
joe    1           2            
joe    0           3            
joe    0           3            
jim    1           1            
jim    1           1            
jim    0           2            
jim    0           2           
jim    1           3            
jim    1           3

I want to add a column that keeps track of every time I get a new value of block for each person under name, but divided by test_type

Here is what I need:

name   test_type   block   block_by_test_type
joe    0           1       1         
joe    0           1       1     
joe    1           2       1     
joe    1           2       1     
joe    0           3       2     
joe    0           3       2     
jim    1           1       1     
jim    1           1       1     
jim    0           2       1     
jim    0           2       1     
jim    1           3       2     
jim    1           3       2

I've been using groupby and cumsum, but I can't manage to get exactly what I need.

Thanks in advance.

CodePudding user response：

Looks like your were close with groupby and cumsum. duplicated makes it all come together.

df['block_by_test_type'] = (
    df.groupby(['name','test_type'], as_index=False)
        .apply(lambda x: (~x['block'].duplicated()).cumsum()).droplevel(0)
)

print(df)

Result

   name  test_type  block  block_by_test_type
0   joe          0      1                   1
1   joe          0      1                   1
2   joe          1      2                   1
3   joe          1      2                   1
4   joe          0      3                   2
5   joe          0      3                   2
6   jim          1      1                   1
7   jim          1      1                   1
8   jim          0      2                   1
9   jim          0      2                   1
10  jim          1      3                   2
11  jim          1      3                   2