Home > database >  How to recursively assign group based on column values?
How to recursively assign group based on column values?

Time:10-18

I'm trying to assign group based on column values. Here's a hypothetical data.

dfdict = {
    'Day' : [1,2,3,4,5,6,7,1,2,3,4,5,6,7,8,9,10,1,2,3],
    'Week of year' : [5,5,5,5,5,5,5,10,10,10,10,10,10,10,11,11,11,15,15,15],
    'Month' : [2,2,2,2,2,2,2,5,5,5,5,5,5,5,5,5,5,6,6,6],
    'Year' : [2021, 2021, 2021, 2021, 2021, 2021, 2021, 2022, 2022, 2022, 2022, 2022, 2022, 2022, 2022, 2022, 2022, 2022, 2022, 2022]
}

df = pd.DataFrame(dfdict)

How can I generate a variable for the week of month? The resulting variable and data frame looks like this:

    Day Week of year    Month   Year   Week of month
0   1   5                  2    2021   1
1   2   5                  2    2021   1
2   3   5                  2    2021   1
3   4   5                  2    2021   1
4   5   5                  2    2021   1
5   6   5                  2    2021   1
6   7   5                  2    2021   1
7   1   10                 5    2022   1
8   2   10                 5    2022   1
9   3   10                 5    2022   1
10  4   10                 5    2022   1
11  5   10                 5    2022   1
12  6   10                 5    2022   1
13  7   10                 5    2022   1
14  8   11                 5    2022   2
15  9   11                 5    2022   2
16  10  11                 5    2022   2
17  1   15                 6    2022   1
18  2   15                 6    2022   1
19  3   15                 6    2022   1

I tried some groupby but couldn't make it work. The following generates a cumulative count, instead of a recursive count that I want to make

df['Week of month'] = df.groupby(['Year', 'Month', 'Week']).ngroup()

CodePudding user response:

Try with factorize

df['Week of month'] = df.groupby(['Year','Month'])['Week of year'].\
                          transform(lambda x : x.factorize()[0] 1)
Out[273]: 
0     1
1     1
2     1
3     1
4     1
5     1
6     1
7     1
8     1
9     1
10    1
11    1
12    1
13    1
14    2
15    2
16    2
17    1
18    1
19    1
Name: Week of year, dtype: int64

CodePudding user response:

Let's do a nested groupby

df['Week of month'] = (df.groupby(['Year', 'Month'])
                       .apply(lambda g: g.groupby('Week of year').ngroup().add(1))
                       .reset_index(drop=True))
print(df)

    Day  Week of year  Month  Year  Week of month
0     1             5      2  2021              1
1     2             5      2  2021              1
2     3             5      2  2021              1
3     4             5      2  2021              1
4     5             5      2  2021              1
5     6             5      2  2021              1
6     7             5      2  2021              1
7     1            10      5  2022              1
8     2            10      5  2022              1
9     3            10      5  2022              1
10    4            10      5  2022              1
11    5            10      5  2022              1
12    6            10      5  2022              1
13    7            10      5  2022              1
14    8            11      5  2022              2
15    9            11      5  2022              2
16   10            11      5  2022              2
17    1            15      6  2022              1
18    2            15      6  2022              1
19    3            15      6  2022              1
  • Related