I'm trying to find a method of duplicating all of the data in a row for every month between dates. Start date and end date.
This is the dataset:
ID | Start | End |
---|---|---|
1007 | 2022-03-01 | 2022-08-01 |
1008 | 2019-11-01 | 2020-02-01 |
What I would like to do is repeat the row, incrementing the date, every month between the start and end values. Example outcome:
ID | Start | End |
---|---|---|
1007 | 2022-03-01 | 2022-08-01 |
1007 | 2022-04-01 | 2022-08-01 |
1007 | 2022-05-01 | 2022-08-01 |
1007 | 2022-06-01 | 2022-08-01 |
1007 | 2022-07-01 | 2022-08-01 |
1007 | 2022-07-01 | 2022-08-01 |
1008 | 2019-11-01 | 2020-02-01 |
1008 | 2019-12-01 | 2020-02-01 |
1008 | 2020-01-01 | 2020-02-01 |
1008 | 2020-02-01 | 2020-02-01 |
Thanks
CodePudding user response:
you can move in all row data and check data_start is preset start duplicated and when present the date_end can you exit the loop Thanks
CodePudding user response:
here is one way to do it
create range from start and end, then use explode. drop the temporary created column 'range' and remove rows where start and end date are same
df=df.assign(range=df.apply(lambda x: pd.date_range(x['Start'], x['End'], freq='MS').strftime('%Y-%m-%d').tolist(), axis=1)).explode('range').reset_index()
df['Start'] = df['range']
df.drop(df[df['Start'] == df['End']].index, inplace=True)
df.drop(columns=['index','range'], inplace=True)
df
ID Start End
0 1007 2022-03-01 2022-08-01
1 1007 2022-04-01 2022-08-01
2 1007 2022-05-01 2022-08-01
3 1007 2022-06-01 2022-08-01
4 1007 2022-07-01 2022-08-01
6 1008 2019-11-01 2020-02-01
7 1008 2019-12-01 2020-02-01
8 1008 2020-01-01 2020-02-01