I have a pandas dataframe called df
that looks like this
name test_type block
joe 0 1
joe 0 1
joe 1 2
joe 1 2
joe 0 3
joe 0 3
jim 1 1
jim 1 1
jim 0 2
jim 0 2
jim 1 3
jim 1 3
I want to add a column that keeps track of every time I get a new value of block
for each person under name
, but divided by test_type
Here is what I need:
name test_type block block_by_test_type
joe 0 1 1
joe 0 1 1
joe 1 2 1
joe 1 2 1
joe 0 3 2
joe 0 3 2
jim 1 1 1
jim 1 1 1
jim 0 2 1
jim 0 2 1
jim 1 3 2
jim 1 3 2
I've been using groupby
and cumsum
, but I can't manage to get exactly what I need.
Thanks in advance.
CodePudding user response:
Looks like your were close with groupby
and cumsum
. duplicated
makes it all come together.
df['block_by_test_type'] = (
df.groupby(['name','test_type'], as_index=False)
.apply(lambda x: (~x['block'].duplicated()).cumsum()).droplevel(0)
)
print(df)
Result
name test_type block block_by_test_type
0 joe 0 1 1
1 joe 0 1 1
2 joe 1 2 1
3 joe 1 2 1
4 joe 0 3 2
5 joe 0 3 2
6 jim 1 1 1
7 jim 1 1 1
8 jim 0 2 1
9 jim 0 2 1
10 jim 1 3 2
11 jim 1 3 2