Home > other >  Enumerate sub groups based on occurence
Enumerate sub groups based on occurence

Time:07-06

I am looking for a simple solution to enumerate subgroups in a pandas dataframe based on their occurance. Here is a sample dataframe and the results I want to achieve. Its a bit like ngroup(), but counter resets every major group. Data is grouped by 'LOT' and 'TDATE'.

import pandas as pd

df = pd.DataFrame({

       "LOT": ['A', 'A', 'A', 'A', 'A', 'A','B', 'B', 'B','B', 'B', 'B'],

       "TDATE": ['01.01.2022', '01.01.2022', '02.01.2022', '02.01.2022', '03.01.2022', '03.01.2022', '02.01.2022', '02.01.2022', '03.01.2022','03.01.2022', '06.01.2022', '06.01.2022'],

       "MOD": ['TP1', 'TP2', 'TP1', 'TP2', 'TP1', 'TP2','TP1', 'TP2','TP1', 'TP2','TP1', 'TP2'],

       "MWERT": [1.2, 2.3, 3.5, 4.6, 5.1, 6.4, 7.3, 2.8, 1.9, 1.10, 1.1, 1.2]})

Results of df['ENUM'] = df.groupby(['LOT','TDATE']).ngroup()

   LOT       TDATE  MOD  MWERT  ENUM
0    A  01.01.2022  TP1    1.2     0
1    A  01.01.2022  TP2    2.3     0
2    A  02.01.2022  TP1    3.5     1
3    A  02.01.2022  TP2    4.6     1
4    A  03.01.2022  TP1    5.1     2
5    A  03.01.2022  TP2    6.4     2
6    B  02.01.2022  TP1    7.3     3
7    B  02.01.2022  TP2    2.8     3
8    B  03.01.2022  TP1    1.9     4
9    B  03.01.2022  TP2    1.1     4
10   B  06.01.2022  TP1    1.1     5
11   B  06.01.2022  TP2    1.2     5

but this is what I need:

   LOT       TDATE  MOD  MWERT  ENUM
0    A  01.01.2022  TP1    1.2     0
1    A  01.01.2022  TP2    2.3     0
2    A  02.01.2022  TP1    3.5     1
3    A  02.01.2022  TP2    4.6     1
4    A  03.01.2022  TP1    5.1     2
5    A  03.01.2022  TP2    6.4     2
6    B  02.01.2022  TP1    7.3     0
7    B  02.01.2022  TP2    2.8     0
8    B  03.01.2022  TP1    1.9     1
9    B  03.01.2022  TP2    1.1     1
10   B  06.01.2022  TP1    1.1     2
11   B  06.01.2022  TP2    1.2     2

As you can see, the ngroup() "resets" for any new 'LOT'.

CodePudding user response:

You can use a nested groupby:

df['ENUM'] = (df.groupby('LOT')
                .apply(lambda g: g.groupby('TDATE').ngroup())
                .droplevel(0)
              )

output:

   LOT       TDATE  MOD  MWERT  ENUM
0    A  01.01.2022  TP1    1.2     0
1    A  01.01.2022  TP2    2.3     0
2    A  02.01.2022  TP1    3.5     1
3    A  02.01.2022  TP2    4.6     1
4    A  03.01.2022  TP1    5.1     2
5    A  03.01.2022  TP2    6.4     2
6    B  02.01.2022  TP1    7.3     0
7    B  02.01.2022  TP2    2.8     0
8    B  03.01.2022  TP1    1.9     1
9    B  03.01.2022  TP2    1.1     1
10   B  06.01.2022  TP1    1.1     2
11   B  06.01.2022  TP2    1.2     2
  • Related