I am looking for a simple solution to enumerate subgroups in a pandas dataframe based on their occurance. Here is a sample dataframe and the results I want to achieve. Its a bit like ngroup(), but counter resets every major group. Data is grouped by 'LOT' and 'TDATE'.
import pandas as pd
df = pd.DataFrame({
"LOT": ['A', 'A', 'A', 'A', 'A', 'A','B', 'B', 'B','B', 'B', 'B'],
"TDATE": ['01.01.2022', '01.01.2022', '02.01.2022', '02.01.2022', '03.01.2022', '03.01.2022', '02.01.2022', '02.01.2022', '03.01.2022','03.01.2022', '06.01.2022', '06.01.2022'],
"MOD": ['TP1', 'TP2', 'TP1', 'TP2', 'TP1', 'TP2','TP1', 'TP2','TP1', 'TP2','TP1', 'TP2'],
"MWERT": [1.2, 2.3, 3.5, 4.6, 5.1, 6.4, 7.3, 2.8, 1.9, 1.10, 1.1, 1.2]})
Results of df['ENUM'] = df.groupby(['LOT','TDATE']).ngroup()
LOT TDATE MOD MWERT ENUM
0 A 01.01.2022 TP1 1.2 0
1 A 01.01.2022 TP2 2.3 0
2 A 02.01.2022 TP1 3.5 1
3 A 02.01.2022 TP2 4.6 1
4 A 03.01.2022 TP1 5.1 2
5 A 03.01.2022 TP2 6.4 2
6 B 02.01.2022 TP1 7.3 3
7 B 02.01.2022 TP2 2.8 3
8 B 03.01.2022 TP1 1.9 4
9 B 03.01.2022 TP2 1.1 4
10 B 06.01.2022 TP1 1.1 5
11 B 06.01.2022 TP2 1.2 5
but this is what I need:
LOT TDATE MOD MWERT ENUM
0 A 01.01.2022 TP1 1.2 0
1 A 01.01.2022 TP2 2.3 0
2 A 02.01.2022 TP1 3.5 1
3 A 02.01.2022 TP2 4.6 1
4 A 03.01.2022 TP1 5.1 2
5 A 03.01.2022 TP2 6.4 2
6 B 02.01.2022 TP1 7.3 0
7 B 02.01.2022 TP2 2.8 0
8 B 03.01.2022 TP1 1.9 1
9 B 03.01.2022 TP2 1.1 1
10 B 06.01.2022 TP1 1.1 2
11 B 06.01.2022 TP2 1.2 2
As you can see, the ngroup() "resets" for any new 'LOT'.
CodePudding user response:
You can use a nested groupby:
df['ENUM'] = (df.groupby('LOT')
.apply(lambda g: g.groupby('TDATE').ngroup())
.droplevel(0)
)
output:
LOT TDATE MOD MWERT ENUM
0 A 01.01.2022 TP1 1.2 0
1 A 01.01.2022 TP2 2.3 0
2 A 02.01.2022 TP1 3.5 1
3 A 02.01.2022 TP2 4.6 1
4 A 03.01.2022 TP1 5.1 2
5 A 03.01.2022 TP2 6.4 2
6 B 02.01.2022 TP1 7.3 0
7 B 02.01.2022 TP2 2.8 0
8 B 03.01.2022 TP1 1.9 1
9 B 03.01.2022 TP2 1.1 1
10 B 06.01.2022 TP1 1.1 2
11 B 06.01.2022 TP2 1.2 2