Newbie here. I'm trying to take a policy and create two records per year, split around their anniversary date. My input would be a single row for each policy, and would include an issue date and an end date. I want to expand that out to create two records for each policy, split around their anniversary date. My output should look like this:
policy_key | issue_date | end_date | period_start | period_end |
---|---|---|---|---|
12345 | 2005-03-15 | 2020-10-18 | 2005-03-15 | 2005-12-31 |
12345 | 2005-03-15 | 2020-10-18 | 2006-01-01 | 2006-03-14 |
12345 | 2005-03-15 | 2020-10-18 | 2006-03-15 | 2006-12-31 |
... | ... | ... | ... | ... |
12345 | 2005-03-15 | 2020-10-18 | 2020-01-01 | 2020-03-14 |
12345 | 2005-03-15 | 2020-10-18 | 2020-03-15 | 2020-10-18 |
Here's what I've tried so far, but I can't figure out how to update the period_start dates to start at the beginning of the year or on the anniversary date, and period_end dates to end on the day prior to the anniversary date or on the last day of the year.
# importing libraries
import pandas as pd
import datetime as dt
import numpy as np
import math as math
from datetime import date, timedelta
# arbitrary policy inputs
policy_key = '12345'
issue_date = '2005-03-15'
issue_age = 35
duration = 10
end_date = '2020-10-18'
freq = pd.DateOffset(months=6)
# creating dataframe
df = pd.DataFrame({'policy_key':policy_key,
'issue_date':issue_date,
'issue_age':issue_age,
'duration':duration,
'attained_age':issue_age duration - 1,
'period_start':pd.date_range(start=issue_date,end=end_date, freq=freq)})
df['issue_date'] = pd.to_datetime(df['issue_date'], format='%Y-%m-%d')```
CodePudding user response:
Slightly complicated, but does the job!
# importing libraries
import pandas as pd
import datetime as dt
import numpy as np
import math as math
from datetime import date, timedelta
# arbitrary policy inputs
policy_key = '12345'
issue_date = '2005-03-15'
issue_age = 35
duration = 10
end_date = '2020-10-18'
freq = pd.DateOffset(months=6)
Set dates to datetime
:
issue_date = pd.to_datetime(issue_date, format="%Y-%m-%d")
end_date = pd.to_datetime(end_date, format="%Y-%m-%d")
Create 3 lists for each date range:
For the start date:
- the given
issue_date
- a list of dates at the beginning of each year, offset by
issue_date.month-1
andissue_date.day-1
- a list of dates at the end of each year
Similar with the end dates, but offset by issue_date.day-2
.
# start dates
start_dates = [issue_date] \
list(pd.date_range(issue_date,
end_date,
freq=pd.offsets.YearBegin()) \
pd.offsets.DateOffset(months=issue_date.month-1,
days=issue_date.day-1)) \
list(pd.date_range(issue_date,
end_date,
freq=pd.offsets.YearBegin()))
# end dates
end_dates = [end_date] \
list(pd.date_range(issue_date,
end_date,
freq=pd.offsets.YearEnd())) \
list(pd.date_range(issue_date,
end_date,
freq=pd.offsets.YearBegin()) \
pd.offsets.DateOffset(months=issue_date.month-1,
days=issue_date.day-2))
Sort the values so they are in order:
# sort values
start_dates.sort()
end_dates.sort()
Add as columns to DataFrame:
df = pd.DataFrame({'policy_key': policy_key,
'issue_date': issue_date,
'end_date': end_date,
'issue_age': issue_age,
'duration': duration,
'attained_age': issue_age duration - 1,
'period_start': start_dates,
'period_end': end_dates})
# policy_key issue_date end_date issue_age duration attained_age period_start period_end
# 0 12345 2005-03-15 2020-10-18 35 10 44 2005-03-15 2005-12-31
# 1 12345 2005-03-15 2020-10-18 35 10 44 2006-01-01 2006-03-14
# 2 12345 2005-03-15 2020-10-18 35 10 44 2006-03-15 2006-12-31
# 3 12345 2005-03-15 2020-10-18 35 10 44 2007-01-01 2007-03-14
# 4 12345 2005-03-15 2020-10-18 35 10 44 2007-03-15 2007-12-31
# 5 12345 2005-03-15 2020-10-18 35 10 44 2008-01-01 2008-03-14
# 6 12345 2005-03-15 2020-10-18 35 10 44 2008-03-15 2008-12-31
# 7 12345 2005-03-15 2020-10-18 35 10 44 2009-01-01 2009-03-14
# 8 12345 2005-03-15 2020-10-18 35 10 44 2009-03-15 2009-12-31
# 9 12345 2005-03-15 2020-10-18 35 10 44 2010-01-01 2010-03-14
# 10 12345 2005-03-15 2020-10-18 35 10 44 2010-03-15 2010-12-31
# 11 12345 2005-03-15 2020-10-18 35 10 44 2011-01-01 2011-03-14
# 12 12345 2005-03-15 2020-10-18 35 10 44 2011-03-15 2011-12-31
# 13 12345 2005-03-15 2020-10-18 35 10 44 2012-01-01 2012-03-14
# 14 12345 2005-03-15 2020-10-18 35 10 44 2012-03-15 2012-12-31
# 15 12345 2005-03-15 2020-10-18 35 10 44 2013-01-01 2013-03-14
# 16 12345 2005-03-15 2020-10-18 35 10 44 2013-03-15 2013-12-31
# 17 12345 2005-03-15 2020-10-18 35 10 44 2014-01-01 2014-03-14
# 18 12345 2005-03-15 2020-10-18 35 10 44 2014-03-15 2014-12-31
# 19 12345 2005-03-15 2020-10-18 35 10 44 2015-01-01 2015-03-14
# 20 12345 2005-03-15 2020-10-18 35 10 44 2015-03-15 2015-12-31
# 21 12345 2005-03-15 2020-10-18 35 10 44 2016-01-01 2016-03-14
# 22 12345 2005-03-15 2020-10-18 35 10 44 2016-03-15 2016-12-31
# 23 12345 2005-03-15 2020-10-18 35 10 44 2017-01-01 2017-03-14
# 24 12345 2005-03-15 2020-10-18 35 10 44 2017-03-15 2017-12-31
# 25 12345 2005-03-15 2020-10-18 35 10 44 2018-01-01 2018-03-14
# 26 12345 2005-03-15 2020-10-18 35 10 44 2018-03-15 2018-12-31
# 27 12345 2005-03-15 2020-10-18 35 10 44 2019-01-01 2019-03-14
# 28 12345 2005-03-15 2020-10-18 35 10 44 2019-03-15 2019-12-31
# 29 12345 2005-03-15 2020-10-18 35 10 44 2020-01-01 2020-03-14
# 30 12345 2005-03-15 2020-10-18 35 10 44 2020-03-15 2020-10-18