My intial Dataframe looks as follows (except column Counter
):
Index | User | Status | Counter |
---|---|---|---|
1 | John | A | 1 |
2 | Ellen | A | 1 |
3 | John | B | 0 |
4 | Ellen | A | 2 |
5 | John | A | 1 |
6 | John | A | 2 |
7 | John | A | 3 |
8 | John | A | 4 |
9 | Ellen | A | 3 |
10 | John | B | 3 |
11 | Ellen | B | 2 |
12 | Ellen | C | 1 |
13 | Ellen | A | 2 |
14 | Ellen | A | 3 |
In this case I have two users (John/Ellen
). In fact, there are way more users.
The Counter
column is my goal to achieve. If I had only one user, the code would look like this:
count = 0
CounterList = []
for i, row in df.iterrows():
if row["Status"] == "A":
count = 1
elif row["Status"] == "B" or row["Status"] == "C":
count -= 1
CounterList.append(count)
df["Counter"] = CounterList
df
With status A it counts up by 1, with status B or C the counter is reduced by one.
But how to handle two or more users? How to build subgroups and counting each user-subgroup separately?
CodePudding user response:
You can check if the Status is equal to A and map 1, -1 otherwise. Then perform a cumsum per group:
df['Counter'] = (df['Status']
.eq('A').map({True: 1, False: -1}) # make A: 1, other: -1
.groupby(df['User']).cumsum() # cumsum per group
)
output:
Index User Status Counter
0 1 John A 1
1 2 Ellen A 1
2 3 John B 0
3 4 Ellen A 2
4 5 John A 1
5 6 John A 2
6 7 John A 3
7 8 John A 4
8 9 Ellen A 3
9 10 John B 3
10 11 Ellen B 2
11 12 Ellen C 1
12 13 Ellen A 2
13 14 Ellen A 3
CodePudding user response:
Create a mapping from Status to its corresponding score; then groupby
cumsum
:
mapping = {'A':1, 'B':-1,'C':-1}
df['Counter'] = df.assign(Counter=df['Status'].map(mapping)).groupby('User')['Counter'].cumsum()
Output:
Index User Status Counter
0 1 John A 1
1 2 Ellen A 1
2 3 John B 0
3 4 Ellen A 2
4 5 John A 1
5 6 John A 2
6 7 John A 3
7 8 John A 4
8 9 Ellen A 3
9 10 John B 3
10 11 Ellen B 2
11 12 Ellen C 1
12 13 Ellen A 2
13 14 Ellen A 3