I have a dataframe like this:
id | start | end |
---|---|---|
a1 | 2018-03-04 | 2020-03-03 |
b1 | 2019-01-24 | 2021-05-08 |
And I would like an output of this:
id | start | end |
---|---|---|
a1 | 2018-03-04 | 2019-03-03 |
a1 | 2019-03-04 | 2020-03-03 |
b1 | 2019-01-24 | 2020-01-23 |
b1 | 2020-01-24 | 2021-01-23 |
b1 | 2021-01-24 | 2021-05-08 |
Thank you!
CodePudding user response:
Try this:
def split_date_range(row):
start, end = row[['start', 'end']]
dates = []
s = start
while True:
e = s pd.DateOffset(years=1) - pd.Timedelta(days=1)
if e < end:
dates.append((s, e))
s = e pd.Timedelta(days=1)
else:
dates.append((s, end))
break
return dates
tmp = df.set_index('id').apply(split_date_range, axis=1).explode()
result = pd.DataFrame([
[id_, *timerange] for id_, timerange in tmp.iteritems()
], columns=['id', 'start', 'end'])
The code uses id_
to avoid collision with the id()
built-in function.