How to mark values while keep count each time we see a value?-CodePudding

does anyone know a way to count values while annotating them? For example.

We have a reparative state with 3 different values. A, B, C. We want to count how many times we saw state A but values A are different length of A's in a column. We have Column A and want to get column B. So we have time series A, B, C and we want to count time wise that A is 1 then B is 1 and C is 1 but when we again see A is 2 and B is 2 and C is 2 and the A again is 3 and so on... (The order is always the same first A then B then C and again A...)

Any ideas?

Column A	Column B
A	1
A	1
A	1
B	1
B	1
C	1
A	2
B	2
C	2
A	3
A	3
A	3
A	3
A	3
A	3
B	3
B	3
B	3
C	3
C	3

Tried to get a loop but don't know how to count the state.

CodePudding user response：

Try:

from itertools import count
from collections import defaultdict

d = defaultdict(count)

df['Column B new'] = df.groupby((df['Column A'] != df['Column A'].shift()).cumsum())['Column A'].transform(lambda x: next(d[x.iat[0]])   1)
print(df)

Prints:

   Column A  Column B  Column B new
0         A         1             1
1         A         1             1
2         A         1             1
3         B         1             1
4         B         1             1
5         C         1             1
6         A         2             2
7         B         2             2
8         C         2             2
9         A         3             3
10        A         3             3
11        A         3             3
12        A         3             3
13        A         3             3
14        A         3             3
15        B         3             3
16        B         3             3
17        B         3             3
18        C         3             3
19        C         3             3

CodePudding user response：

Here is a one way to do it :

N = len(df["Column A"].unique())

# is the current ts equal to the previous ? if so, cumsum
cs = df["Column A"].ne(df["Column A"].shift().bfill()).cumsum() 

df["Column B"] = (cs // N   1).astype(int)

Output :

print(df)

   Column A  Column B
0         A         1
1         A         1
2         A         1
3         B         1
4         B         1
5         C         1
6         A         2
7         B         2
8         C         2
9         A         3
10        A         3
11        A         3
12        A         3
13        A         3
14        A         3
15        B         3
16        B         3
17        B         3
18        C         3
19        C         3

CodePudding user response：

A basic approach without any modules. A for loop iterates through each value in column_a. The counter variable is incremented only if the current value is 'A' and the previous value is not 'A'. This ensures that the counter is only incremented once when a new group of 'A','B','C' begins. The current value of counter is appended to column_b for each iteration. Finally, the value of prev_value is updated to the current value for the next iteration.

column_b = []
counter, prev_value = 0, ''
for value in column_a:
    counter  = value == 'A' and prev_value != 'A'
    column_b.append(counter)
    prev_value = value

print(column_b)
[1, 1, 1, 1, 1, 1, 2, 2, 2, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3]