Home > Net >  Pandas duplicate data between 2 dates
Pandas duplicate data between 2 dates

Time:07-04

I'm trying to find a method of duplicating all of the data in a row for every month between dates. Start date and end date.

This is the dataset:

ID Start End
1007 2022-03-01 2022-08-01
1008 2019-11-01 2020-02-01

What I would like to do is repeat the row, incrementing the date, every month between the start and end values. Example outcome:

ID Start End
1007 2022-03-01 2022-08-01
1007 2022-04-01 2022-08-01
1007 2022-05-01 2022-08-01
1007 2022-06-01 2022-08-01
1007 2022-07-01 2022-08-01
1007 2022-07-01 2022-08-01
1008 2019-11-01 2020-02-01
1008 2019-12-01 2020-02-01
1008 2020-01-01 2020-02-01
1008 2020-02-01 2020-02-01

Thanks

CodePudding user response:

you can move in all row data and check data_start is preset start duplicated and when present the date_end can you exit the loop Thanks

CodePudding user response:

here is one way to do it

create range from start and end, then use explode. drop the temporary created column 'range' and remove rows where start and end date are same

df=df.assign(range=df.apply(lambda x: pd.date_range(x['Start'], x['End'], freq='MS').strftime('%Y-%m-%d').tolist(), axis=1)).explode('range').reset_index()
df['Start'] = df['range']
df.drop(df[df['Start'] == df['End']].index, inplace=True)
df.drop(columns=['index','range'], inplace=True)
df
    ID           Start        End
0   1007    2022-03-01  2022-08-01
1   1007    2022-04-01  2022-08-01
2   1007    2022-05-01  2022-08-01
3   1007    2022-06-01  2022-08-01
4   1007    2022-07-01  2022-08-01
6   1008    2019-11-01  2020-02-01
7   1008    2019-12-01  2020-02-01
8   1008    2020-01-01  2020-02-01
  • Related