Sum column and store in a separate column in Pandas-CodePudding

I have 2 columns in my dataframe, Col A and Col B. I want to create Col C.

The logic of Col C - For each unique value in Col A, when Col B has a '1', '2' or '5' then keep incrementing the number in Col C. For all other numbers store a value of 0 in Col C.

Col A	Col B	Col C
A1	0	0
A1	1	1
A1	5	2
A1	1	3
A1	2	4
B1	1	1
B1	2	2
B1	3	0
C1	1	1
C1	0	0

I was able to identify if Col A row 1 is same as row 2 using this

df['Col A'].shift(-1)==df['Col A']

and i can use an if condition on Col B to identify if its a 1 or a 2 or 5

df.loc[(df["Col B"] == 1 | (df["Col B"] == 2) | (df["Col B"] == 5)]

but don't know how to combine these 2 to get the output in Col C.

CodePudding user response：

IIUC, if you want to cumcount except when a non 1/2/5:

m = df['Col B'].isin((1,2,5))
df['Col C'] = m.groupby([df['Col A'], ~m]).cumsum()

Or if you want to systematically reset when a non 1/2/5 is found:

df['Col C'] = m.groupby([df['Col A'], (~m).cumsum()]).cumsum()

output:

  Col A  Col B  Col C
0    A1      0      0
1    A1      1      1
2    A1      5      2
3    A1      1      3
4    A1      2      4
5    B1      1      1
6    B1      2      2
7    B1      3      0
8    C1      1      1
9    C1      0      0

comparison of the two options

As pointed out by @jezrael the question is ambiguous on this point, so you can pick the alternative you need:

m = df['Col B'].isin((1,2,5))

df['Col C (continue)'] = m.groupby([df['Col A'], ~m]).cumsum()
df['Col C (reset)'] = m.groupby([df['Col A'], (~m).cumsum()]).cumsum()

output:

   Col A  Col B  Col C (continue)  Col C (reset)
0     A1      0                 0              0
1     A1      1                 1              1
2     A1      5                 2              2
3     A1      1                 3              3
4     A1      2                 4              4
5     B1      1                 1              1
6     B1      2                 2              2
7     B1      3                 0              0
8     B1      2                 3              1    # difference
9     C1      1                 1              1
10    C1      0                 0              0