I have the following dataframe:
id | date_start | date_end | reporting_month | reporting_month_number | months_length |
---|---|---|---|---|---|
1 | 2022-03-31 23:56:22 | 2022-05-01 23:56:22 | 2022-03 | 1 | 3 |
2 | 2022-03-31 23:48:48 | 2022-06-01 23:48:48 | 2022-03 | 1 | 4 |
3 | 2022-03-31 23:47:36 | 2022-08-01 23:47:36 | 2022-03 | 1 | 6 |
I would like to split each id row so I can have a row for each of the months_length
, starting on the date of reporting_month
, like this:
id | date_start | date_end | reporting_month | reporting_month_number | months_length |
---|---|---|---|---|---|
1 | 2022-03-31 23:56:22 | 2022-05-01 23:56:22 | 2022-03 | 1 | 3 |
1 | 2022-03-31 23:56:22 | 2022-05-01 23:56:22 | 2022-04 | 2 | 3 |
1 | 2022-03-31 23:56:22 | 2022-05-01 23:56:22 | 2022-05 | 3 | 3 |
2 | 2022-03-31 23:48:48 | 2022-06-01 23:48:48 | 2022-03 | 1 | 4 |
2 | 2022-03-31 23:48:48 | 2022-06-01 23:48:48 | 2022-03 | 2 | 4 |
2 | 2022-03-31 23:48:48 | 2022-06-01 23:48:48 | 2022-04 | 3 | 4 |
2 | 2022-03-31 23:48:48 | 2022-06-01 23:48:48 | 2022-05 | 4 | 4 |
3 | 2022-03-31 23:47:36 | 2022-08-01 23:47:36 | 2022-03 | 1 | 6 |
3 | 2022-03-31 23:47:36 | 2022-08-01 23:47:36 | 2022-04 | 2 | 6 |
3 | 2022-03-31 23:47:36 | 2022-08-01 23:47:36 | 2022-05 | 3 | 6 |
3 | 2022-03-31 23:47:36 | 2022-08-01 23:47:36 | 2022-06 | 4 | 6 |
3 | 2022-03-31 23:47:36 | 2022-08-01 23:47:36 | 2022-07 | 5 | 6 |
3 | 2022-03-31 23:47:36 | 2022-08-01 23:47:36 | 2022-08 | 6 | 6 |
I have tried several approaches but I can't seem to reach my objective.
Does anyone have a suggestion on how to achieve this?
Thanks.
CodePudding user response:
One possible solution is,
df= df.loc[df.index.repeat(df['months_length'])].reset_index(drop=True)
df['reporting_month_number'] = df.groupby('id')['reporting_month_number'].cumsum()
O/P:
id date_start date_end reporting_month \
0 1 2022-03-31 23:56:22 2022-05-01 23:56:22 2022-03
1 1 2022-03-31 23:56:22 2022-05-01 23:56:22 2022-03
2 1 2022-03-31 23:56:22 2022-05-01 23:56:22 2022-03
3 2 2022-03-31 23:48:48 2022-06-01 23:48:48 2022-03
4 2 2022-03-31 23:48:48 2022-06-01 23:48:48 2022-03
5 2 2022-03-31 23:48:48 2022-06-01 23:48:48 2022-03
6 2 2022-03-31 23:48:48 2022-06-01 23:48:48 2022-03
7 3 2022-03-31 23:47:36 2022-08-01 23:47:36 2022-03
8 3 2022-03-31 23:47:36 2022-08-01 23:47:36 2022-03
9 3 2022-03-31 23:47:36 2022-08-01 23:47:36 2022-03
10 3 2022-03-31 23:47:36 2022-08-01 23:47:36 2022-03
11 3 2022-03-31 23:47:36 2022-08-01 23:47:36 2022-03
12 3 2022-03-31 23:47:36 2022-08-01 23:47:36 2022-03
reporting_month_number months_length
0 1 3
1 2 3
2 3 3
3 1 4
4 2 4
5 3 4
6 4 4
7 1 6
8 2 6
9 3 6
10 4 6
11 5 6
12 6 6
Explanation:
- Repeat rows based on months_length
- Update Reporing Month Number based on groupby 'id'