I went through different stackoverflow questions and finally posting it because I couldnt solve one of the issues I am facing. I have a dataframe like below
A B C
group1 group1_c 12
group1 group1_c 12
group1 group1_c 12
group1 group1_c 1
group1 group1_c 12
group1 group1_c 12
I have to match two rows together and whenever the value matches, I cumsum it. To do this,
df['cumul'] = df['C'].eq(df.groupby(['A','B'])['C'].shift(1).ffill()).groupby([df['A'],df['B']).cumsum()
Once I do this,
A B C Cumul
group1 group1_c 12 0
group1 group1_c 12 1
group1 group1_c 12 2
group1 group1_c 1 2
group1 group1_c 12 3
group1 group1_c 12 3
Whereas I want to reset if the condition is not met.Expected solution
A B C Cumul
group1 group1_c 12 0
group1 group1_c 12 1
group1 group1_c 12 2
group1 group1_c 1 0
group1 group1_c 12 0
group1 group1_c 12 1
Please advice Thank you
CodePudding user response:
If need count groups per consecutive values of C
column use Series.ne
with Series.shift
and cumulative sum, last use counter by GroupBy.cumcount
:
df['cumul'] = df.groupby(df['C'].ne(df['C'].shift()).cumsum()).cumcount()
print (df)
A B C cumul
0 group1 group1_c 12 0
1 group1 group1_c 12 1
2 group1 group1_c 12 2
3 group1 group1_c 1 0
4 group1 group1_c 12 0
5 group1 group1_c 12 1
If need per A, B
groups also add both groups:
print (df)
A B C
0 group1 group1_c 12
1 group1 group2_c 12 <-changed groups
2 group1 group2_c 12 <-changed groups
3 group1 group1_c 1
4 group1 group1_c 12
5 group1 group1_c 12
s = df['C'].ne(df['C'].shift()).cumsum()
df['cumul'] = df.groupby([df['A'],df['B'], s]).cumcount()
df['cumul1'] = df.groupby(df['C'].ne(df['C'].shift()).cumsum()).cumcount()
print (df)
A B C cumul cumul1
0 group1 group1_c 12 0 0
1 group1 group2_c 12 0 1
2 group1 group2_c 12 1 2
3 group1 group1_c 1 0 0
4 group1 group1_c 12 0 0
5 group1 group1_c 12 1 1
Alternative solution:
s = df[['A','B','C']].ne(df[['A','B','C']].shift()).any(axis=1).cumsum()
df['cumul'] = df.groupby(s).cumcount()