I was hoping if anyone could help me with this use-case:
I want to generate dates between two dates and then tag each date with week number, then add both new generated dates and week number as new columns to the original dataframe and map it to user id.
this is the existing dataframe:
user_id | start_dt | end_dt |
---|---|---|
1 | 2022-01-01 | 2022-02-01 |
2 | 2022-01-14 | 2022-03-14 |
3 | 2022-01-05 | 2022-02-05 |
4 | 2022-01-25 | 2022-02-25 |
generating dates between start and end date and tag date with wk number
user_id | date | week_nbr |
---|---|---|
1 | 2022-01-01 | w1 |
1 | 2022-01-02 | w1 |
1 | 2022-01-03 | w1 |
1 | 2022-01-04 | w1 |
1 | 2022-01-05 | w1 |
1 | 2022-01-06 | w1 |
1 | 2022-01-07 | w1 |
1 | 2022-01-08 | w2 |
Finally map the generated wk and dates back to the original table using user_id:
user_id | start_dt | end_dt | date | week_nbr |
---|---|---|---|---|
1 | 2022-01-01 | 2022-02-01 | 2022-01-01 | w1 |
1 | 2022-01-01 | 2022-02-01 | 2022-01-02 | w1 |
1 | 2022-01-01 | 2022-02-01 | 2022-01-03 | w1 |
1 | 2022-01-01 | 2022-02-01 | 2022-01-04 | w1 |
1 | 2022-01-01 | 2022-02-01 | 2022-01-05 | w1 |
1 | 2022-01-01 | 2022-02-01 | 2022-01-06 | w1 |
1 | 2022-01-01 | 2022-02-01 | 2022-01-07 | w1 |
1 | 2022-01-01 | 2022-02-01 | 2022-01-08 | w2 |
Any thoughts?
CodePudding user response:
I believe this should give you what you are looking for:
(df.assign(
date = [pd.date_range(i,j) for i,j in zip(df['start_dt'],df['end_dt'])]).explode('date')
.assign(week_nbr = lambda x: x.groupby('user_id')['date']
.diff()
.dt.days
.cumsum()
.floordiv(7)
.add(1,fill_value=0)
.astype(int)
.map('w{}'.format))
.reset_index(drop=True))
Output: (top 5 rows)
user_id start_dt end_dt date week_nbr
0 1 2022-01-01 2022-02-01 2022-01-01 w1
1 1 2022-01-01 2022-02-01 2022-01-02 w1
2 1 2022-01-01 2022-02-01 2022-01-03 w1
3 1 2022-01-01 2022-02-01 2022-01-04 w1
4 1 2022-01-01 2022-02-01 2022-01-05 w1