I have a dataframe:
df =
No. | Scenario | Exe Seq | Action |
---|---|---|---|
1 | A | 1 | a |
2 | A | 2 | b |
3 | A | 3 | c |
4 | A | 1 | a |
5 | A | 2 | b |
6 | A | 1 | a |
7 | A | 1 | a |
8 | A | 2 | b |
9 | B | 1 | z |
10 | B | 1 | z |
11 | B | 2 | c |
12 | B | 3 | z |
13 | B | 1 | z |
14 | B | 1 | z |
15 | B | 2 | c |
16 | B | 3 | z |
17 | B | 4 | f |
18 | B | 1 | z |
19 | B | 2 | c |
Some are same scenarios, but some reach three, but some stop at two or one. I want to distinguish these.
The "Scenario" values may have values other than "A"
start sequence always by 1
So I will get:
No. | Scenario | Exe Seq | Action | New_Scenario |
---|---|---|---|---|
1 | A | 1 | a | A_1 |
2 | A | 2 | b | A_1 |
3 | A | 3 | c | A_1 |
4 | A | 1 | a | A_2 |
5 | A | 2 | b | A_2 |
6 | A | 1 | a | A_3 |
7 | A | 1 | a | A_2 |
8 | A | 2 | b | A_2 |
9 | B | 1 | z | B_1 |
10 | B | 1 | z | B_2 |
11 | B | 2 | c | B_2 |
12 | B | 3 | z | B_2 |
13 | B | 1 | z | B_1 |
14 | B | 1 | z | B_3 |
15 | B | 2 | c | B_3 |
16 | B | 3 | z | B_3 |
17 | B | 4 | f | B_3 |
18 | B | 1 | z | B_4 |
19 | B | 2 | c | B_4 |
CodePudding user response:
First join values of Exe Seq
to strings and then call factorize
per secenario groups:
df['g'] = df['Exe Seq'].eq(1).cumsum()
df['New_Scenario'] = (df.groupby(['Scenario', 'g'])['Exe Seq']
.transform(lambda x: ','.join(x.astype(str))))
df['New_Scenario'] = (df['Scenario'] '_'
df.groupby('Scenario')['New_Scenario']
.transform(lambda x: pd.factorize(x)[0] 1)).astype(str)
print (df)
No. Scenario Exe Seq Action g New_Scenario
0 1 A 1 a 1 A_1
1 2 A 2 b 1 A_1
2 3 A 3 c 1 A_1
3 4 A 1 a 2 A_2
4 5 A 2 b 2 A_2
5 6 A 1 a 3 A_3
6 7 A 1 a 4 A_2
7 8 A 2 b 4 A_2
8 9 B 1 z 5 B_1
9 10 B 1 z 6 B_2
10 11 B 2 c 6 B_2
11 12 B 3 z 6 B_2
12 13 B 1 z 7 B_1
13 14 B 1 z 8 B_3
14 15 B 2 c 8 B_3
15 16 B 3 z 8 B_3
16 17 B 4 f 8 B_3
17 18 B 1 z 9 B_4
18 19 B 2 c 9 B_4