I'm trying to assign group based on column values. Here's a hypothetical data.
dfdict = {
'Day' : [1,2,3,4,5,6,7,1,2,3,4,5,6,7,8,9,10,1,2,3],
'Week of year' : [5,5,5,5,5,5,5,10,10,10,10,10,10,10,11,11,11,15,15,15],
'Month' : [2,2,2,2,2,2,2,5,5,5,5,5,5,5,5,5,5,6,6,6],
'Year' : [2021, 2021, 2021, 2021, 2021, 2021, 2021, 2022, 2022, 2022, 2022, 2022, 2022, 2022, 2022, 2022, 2022, 2022, 2022, 2022]
}
df = pd.DataFrame(dfdict)
How can I generate a variable for the week of month? The resulting variable and data frame looks like this:
Day Week of year Month Year Week of month
0 1 5 2 2021 1
1 2 5 2 2021 1
2 3 5 2 2021 1
3 4 5 2 2021 1
4 5 5 2 2021 1
5 6 5 2 2021 1
6 7 5 2 2021 1
7 1 10 5 2022 1
8 2 10 5 2022 1
9 3 10 5 2022 1
10 4 10 5 2022 1
11 5 10 5 2022 1
12 6 10 5 2022 1
13 7 10 5 2022 1
14 8 11 5 2022 2
15 9 11 5 2022 2
16 10 11 5 2022 2
17 1 15 6 2022 1
18 2 15 6 2022 1
19 3 15 6 2022 1
I tried some groupby
but couldn't make it work. The following generates a cumulative count, instead of a recursive count that I want to make
df['Week of month'] = df.groupby(['Year', 'Month', 'Week']).ngroup()
CodePudding user response:
Try with factorize
df['Week of month'] = df.groupby(['Year','Month'])['Week of year'].\
transform(lambda x : x.factorize()[0] 1)
Out[273]:
0 1
1 1
2 1
3 1
4 1
5 1
6 1
7 1
8 1
9 1
10 1
11 1
12 1
13 1
14 2
15 2
16 2
17 1
18 1
19 1
Name: Week of year, dtype: int64
CodePudding user response:
Let's do a nested groupby
df['Week of month'] = (df.groupby(['Year', 'Month'])
.apply(lambda g: g.groupby('Week of year').ngroup().add(1))
.reset_index(drop=True))
print(df)
Day Week of year Month Year Week of month
0 1 5 2 2021 1
1 2 5 2 2021 1
2 3 5 2 2021 1
3 4 5 2 2021 1
4 5 5 2 2021 1
5 6 5 2 2021 1
6 7 5 2 2021 1
7 1 10 5 2022 1
8 2 10 5 2022 1
9 3 10 5 2022 1
10 4 10 5 2022 1
11 5 10 5 2022 1
12 6 10 5 2022 1
13 7 10 5 2022 1
14 8 11 5 2022 2
15 9 11 5 2022 2
16 10 11 5 2022 2
17 1 15 6 2022 1
18 2 15 6 2022 1
19 3 15 6 2022 1