Home > Back-end >  Pandas: Generate Two Dates Per Year
Pandas: Generate Two Dates Per Year

Time:05-22

Newbie here. I'm trying to take a policy and create two records per year, split around their anniversary date. My input would be a single row for each policy, and would include an issue date and an end date. I want to expand that out to create two records for each policy, split around their anniversary date. My output should look like this:

policy_key issue_date end_date period_start period_end
12345 2005-03-15 2020-10-18 2005-03-15 2005-12-31
12345 2005-03-15 2020-10-18 2006-01-01 2006-03-14
12345 2005-03-15 2020-10-18 2006-03-15 2006-12-31
... ... ... ... ...
12345 2005-03-15 2020-10-18 2020-01-01 2020-03-14
12345 2005-03-15 2020-10-18 2020-03-15 2020-10-18

Here's what I've tried so far, but I can't figure out how to update the period_start dates to start at the beginning of the year or on the anniversary date, and period_end dates to end on the day prior to the anniversary date or on the last day of the year.

# importing libraries
import pandas as pd
import datetime as dt
import numpy as np
import math as math
from datetime import date, timedelta

# arbitrary policy inputs
policy_key = '12345'
issue_date = '2005-03-15'
issue_age = 35
duration = 10
end_date = '2020-10-18'
freq = pd.DateOffset(months=6)

# creating dataframe
df = pd.DataFrame({'policy_key':policy_key,
                    'issue_date':issue_date,
                    'issue_age':issue_age,
                    'duration':duration,
                    'attained_age':issue_age   duration - 1,
                    'period_start':pd.date_range(start=issue_date,end=end_date, freq=freq)})

df['issue_date'] =  pd.to_datetime(df['issue_date'], format='%Y-%m-%d')```

CodePudding user response:

Slightly complicated, but does the job!

# importing libraries
import pandas as pd
import datetime as dt
import numpy as np
import math as math
from datetime import date, timedelta

# arbitrary policy inputs
policy_key = '12345'
issue_date = '2005-03-15'
issue_age = 35
duration = 10
end_date = '2020-10-18'
freq = pd.DateOffset(months=6)

Set dates to datetime:

issue_date = pd.to_datetime(issue_date, format="%Y-%m-%d")
end_date = pd.to_datetime(end_date, format="%Y-%m-%d")

Create 3 lists for each date range:

For the start date:

  1. the given issue_date
  2. a list of dates at the beginning of each year, offset by issue_date.month-1 and issue_date.day-1
  3. a list of dates at the end of each year

Similar with the end dates, but offset by issue_date.day-2.

# start dates
start_dates = [issue_date]   \
    list(pd.date_range(issue_date,
                       end_date,
                       freq=pd.offsets.YearBegin())  \
         pd.offsets.DateOffset(months=issue_date.month-1,
                               days=issue_date.day-1))   \
    list(pd.date_range(issue_date,
                       end_date,
                       freq=pd.offsets.YearBegin()))

# end dates

end_dates = [end_date]   \
    list(pd.date_range(issue_date,
                   end_date,
                   freq=pd.offsets.YearEnd()))   \
    list(pd.date_range(issue_date,
                       end_date,
                       freq=pd.offsets.YearBegin())  \
         pd.offsets.DateOffset(months=issue_date.month-1,
                               days=issue_date.day-2))

Sort the values so they are in order:

# sort values
start_dates.sort()
end_dates.sort()

Add as columns to DataFrame:

df = pd.DataFrame({'policy_key': policy_key,
                   'issue_date': issue_date,
                   'end_date': end_date,
                   'issue_age': issue_age,
                   'duration': duration,
                   'attained_age': issue_age   duration - 1,
                   'period_start': start_dates,
                   'period_end': end_dates})
#    policy_key issue_date   end_date  issue_age  duration  attained_age period_start period_end
# 0       12345 2005-03-15 2020-10-18         35        10            44   2005-03-15 2005-12-31
# 1       12345 2005-03-15 2020-10-18         35        10            44   2006-01-01 2006-03-14
# 2       12345 2005-03-15 2020-10-18         35        10            44   2006-03-15 2006-12-31
# 3       12345 2005-03-15 2020-10-18         35        10            44   2007-01-01 2007-03-14
# 4       12345 2005-03-15 2020-10-18         35        10            44   2007-03-15 2007-12-31
# 5       12345 2005-03-15 2020-10-18         35        10            44   2008-01-01 2008-03-14
# 6       12345 2005-03-15 2020-10-18         35        10            44   2008-03-15 2008-12-31
# 7       12345 2005-03-15 2020-10-18         35        10            44   2009-01-01 2009-03-14
# 8       12345 2005-03-15 2020-10-18         35        10            44   2009-03-15 2009-12-31
# 9       12345 2005-03-15 2020-10-18         35        10            44   2010-01-01 2010-03-14
# 10      12345 2005-03-15 2020-10-18         35        10            44   2010-03-15 2010-12-31
# 11      12345 2005-03-15 2020-10-18         35        10            44   2011-01-01 2011-03-14
# 12      12345 2005-03-15 2020-10-18         35        10            44   2011-03-15 2011-12-31
# 13      12345 2005-03-15 2020-10-18         35        10            44   2012-01-01 2012-03-14
# 14      12345 2005-03-15 2020-10-18         35        10            44   2012-03-15 2012-12-31
# 15      12345 2005-03-15 2020-10-18         35        10            44   2013-01-01 2013-03-14
# 16      12345 2005-03-15 2020-10-18         35        10            44   2013-03-15 2013-12-31
# 17      12345 2005-03-15 2020-10-18         35        10            44   2014-01-01 2014-03-14
# 18      12345 2005-03-15 2020-10-18         35        10            44   2014-03-15 2014-12-31
# 19      12345 2005-03-15 2020-10-18         35        10            44   2015-01-01 2015-03-14
# 20      12345 2005-03-15 2020-10-18         35        10            44   2015-03-15 2015-12-31
# 21      12345 2005-03-15 2020-10-18         35        10            44   2016-01-01 2016-03-14
# 22      12345 2005-03-15 2020-10-18         35        10            44   2016-03-15 2016-12-31
# 23      12345 2005-03-15 2020-10-18         35        10            44   2017-01-01 2017-03-14
# 24      12345 2005-03-15 2020-10-18         35        10            44   2017-03-15 2017-12-31
# 25      12345 2005-03-15 2020-10-18         35        10            44   2018-01-01 2018-03-14
# 26      12345 2005-03-15 2020-10-18         35        10            44   2018-03-15 2018-12-31
# 27      12345 2005-03-15 2020-10-18         35        10            44   2019-01-01 2019-03-14
# 28      12345 2005-03-15 2020-10-18         35        10            44   2019-03-15 2019-12-31
# 29      12345 2005-03-15 2020-10-18         35        10            44   2020-01-01 2020-03-14
# 30      12345 2005-03-15 2020-10-18         35        10            44   2020-03-15 2020-10-18
  • Related