Home > Enterprise >  expand row based on integer in column and split into number of months between dates
expand row based on integer in column and split into number of months between dates

Time:08-02

I have the following dataframe:

id date_start date_end reporting_month reporting_month_number months_length
1 2022-03-31 23:56:22 2022-05-01 23:56:22 2022-03 1 3
2 2022-03-31 23:48:48 2022-06-01 23:48:48 2022-03 1 4
3 2022-03-31 23:47:36 2022-08-01 23:47:36 2022-03 1 6

I would like to split each id row so I can have a row for each of the months_length, starting on the date of reporting_month, like this:

id date_start date_end reporting_month reporting_month_number months_length
1 2022-03-31 23:56:22 2022-05-01 23:56:22 2022-03 1 3
1 2022-03-31 23:56:22 2022-05-01 23:56:22 2022-04 2 3
1 2022-03-31 23:56:22 2022-05-01 23:56:22 2022-05 3 3
2 2022-03-31 23:48:48 2022-06-01 23:48:48 2022-03 1 4
2 2022-03-31 23:48:48 2022-06-01 23:48:48 2022-03 2 4
2 2022-03-31 23:48:48 2022-06-01 23:48:48 2022-04 3 4
2 2022-03-31 23:48:48 2022-06-01 23:48:48 2022-05 4 4
3 2022-03-31 23:47:36 2022-08-01 23:47:36 2022-03 1 6
3 2022-03-31 23:47:36 2022-08-01 23:47:36 2022-04 2 6
3 2022-03-31 23:47:36 2022-08-01 23:47:36 2022-05 3 6
3 2022-03-31 23:47:36 2022-08-01 23:47:36 2022-06 4 6
3 2022-03-31 23:47:36 2022-08-01 23:47:36 2022-07 5 6
3 2022-03-31 23:47:36 2022-08-01 23:47:36 2022-08 6 6

I have tried several approaches but I can't seem to reach my objective.

Does anyone have a suggestion on how to achieve this?

Thanks.

CodePudding user response:

One possible solution is,

df= df.loc[df.index.repeat(df['months_length'])].reset_index(drop=True)
df['reporting_month_number'] = df.groupby('id')['reporting_month_number'].cumsum()

O/P:

    id           date_start             date_end reporting_month  \
0    1  2022-03-31 23:56:22  2022-05-01 23:56:22         2022-03   
1    1  2022-03-31 23:56:22  2022-05-01 23:56:22         2022-03   
2    1  2022-03-31 23:56:22  2022-05-01 23:56:22         2022-03   
3    2  2022-03-31 23:48:48  2022-06-01 23:48:48         2022-03   
4    2  2022-03-31 23:48:48  2022-06-01 23:48:48         2022-03   
5    2  2022-03-31 23:48:48  2022-06-01 23:48:48         2022-03   
6    2  2022-03-31 23:48:48  2022-06-01 23:48:48         2022-03   
7    3  2022-03-31 23:47:36  2022-08-01 23:47:36         2022-03   
8    3  2022-03-31 23:47:36  2022-08-01 23:47:36         2022-03   
9    3  2022-03-31 23:47:36  2022-08-01 23:47:36         2022-03   
10   3  2022-03-31 23:47:36  2022-08-01 23:47:36         2022-03   
11   3  2022-03-31 23:47:36  2022-08-01 23:47:36         2022-03   
12   3  2022-03-31 23:47:36  2022-08-01 23:47:36         2022-03   

    reporting_month_number  months_length  
0                        1              3  
1                        2              3  
2                        3              3  
3                        1              4  
4                        2              4  
5                        3              4  
6                        4              4  
7                        1              6  
8                        2              6  
9                        3              6  
10                       4              6  
11                       5              6  
12                       6              6  

​ Explanation:

  1. Repeat rows based on months_length
  2. Update Reporing Month Number based on groupby 'id'
  • Related