does anyone know a way to count values while annotating them? For example.
We have a reparative state with 3 different values. A, B, C. We want to count how many times we saw state A but values A are different length of A's in a column. We have Column A and want to get column B. So we have time series A, B, C and we want to count time wise that A is 1 then B is 1 and C is 1 but when we again see A is 2 and B is 2 and C is 2 and the A again is 3 and so on... (The order is always the same first A then B then C and again A...)
Any ideas?
Column A | Column B |
---|---|
A | 1 |
A | 1 |
A | 1 |
B | 1 |
B | 1 |
C | 1 |
A | 2 |
B | 2 |
C | 2 |
A | 3 |
A | 3 |
A | 3 |
A | 3 |
A | 3 |
A | 3 |
B | 3 |
B | 3 |
B | 3 |
C | 3 |
C | 3 |
Tried to get a loop but don't know how to count the state.
CodePudding user response:
Try:
from itertools import count
from collections import defaultdict
d = defaultdict(count)
df['Column B new'] = df.groupby((df['Column A'] != df['Column A'].shift()).cumsum())['Column A'].transform(lambda x: next(d[x.iat[0]]) 1)
print(df)
Prints:
Column A Column B Column B new
0 A 1 1
1 A 1 1
2 A 1 1
3 B 1 1
4 B 1 1
5 C 1 1
6 A 2 2
7 B 2 2
8 C 2 2
9 A 3 3
10 A 3 3
11 A 3 3
12 A 3 3
13 A 3 3
14 A 3 3
15 B 3 3
16 B 3 3
17 B 3 3
18 C 3 3
19 C 3 3
CodePudding user response:
Here is a one way to do it :
N = len(df["Column A"].unique())
# is the current ts equal to the previous ? if so, cumsum
cs = df["Column A"].ne(df["Column A"].shift().bfill()).cumsum()
df["Column B"] = (cs // N 1).astype(int)
Output :
print(df)
Column A Column B
0 A 1
1 A 1
2 A 1
3 B 1
4 B 1
5 C 1
6 A 2
7 B 2
8 C 2
9 A 3
10 A 3
11 A 3
12 A 3
13 A 3
14 A 3
15 B 3
16 B 3
17 B 3
18 C 3
19 C 3
CodePudding user response:
A basic approach without any modules. A for
loop iterates through each value in column_a
. The counter
variable is incremented only if the current value is 'A'
and the previous value is not 'A'
. This ensures that the counter is only incremented once when a new group of 'A','B','C'
begins. The current value of counter is appended to column_b
for each iteration. Finally, the value of prev_value
is updated to the current value for the next iteration.
column_b = []
counter, prev_value = 0, ''
for value in column_a:
counter = value == 'A' and prev_value != 'A'
column_b.append(counter)
prev_value = value
print(column_b)
[1, 1, 1, 1, 1, 1, 2, 2, 2, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3]