I've been banging my head against trying to build a weekly date range for multiple years for about a day now so I figured I'd ask somewhere where people probably know a lot more than me. My goal ultimately is to merge a daily dataset with a weekly one and I need consistent dates across those two datasets in order to get the TimeSeries library that I want to use (DARTs) to accept my dates. My attempt at generating this looks like so:
all_weeks_generated = pd.DataFrame()
all_weeks_generated['week_dt'] = pd.date_range(start = '2001-01-01', end = '2021-09-13', freq='W')
all_weeks_generated['week'] = all_weeks_generated['week_dt'].dt.isocalendar().week
all_weeks_generated['year'] = all_weeks_generated['week_dt'].dt.year
but that spits out things that seem very wrong, for instance:
all_weeks_generated[(all_weeks_generated.year == 2017) & (all_weeks_generated.week == 52)]
returns:
week_dt week year
834 2017-01-01 52 2017
886 2017-12-31 52 2017
Jan 1 2017 is not the 52nd week of 2017, sadly. I get that Pandas is trying to tell me that 2017-01-01 is in the 52nd week of 2016 but I'm not sure what to do with that. This means that I can't look across my daily data, determine its week, and then get the beginning of the week in which it might occur. I've dug through as much of the Pandas documentation as I can stomach and it seems as though I'm fundamentally doing something wrong. Any tips on what I've done wrong and whether there's a way to modify my approach or another approach that I might look at which would let me make a list of weeks (I'm agnostic to Mon vs Sun, just want something that has one date per week) starting on Jan 1, 2001 and ending on September 13, 2021?
CodePudding user response:
You can set to 0 all weeks equal to 52 at the start of year:
>>> all_weeks_generated[(all_weeks_generated['week_dt'].dt.month.isin([1, 12]))
& (all_weeks_generated['week'] == 52)]
week_dt week year
51 2001-12-30 52 2001
103 2002-12-29 52 2002
155 2003-12-28 52 2003
207 2004-12-26 52 2004
260 2006-01-01 52 2006 # 52 -> 0
312 2006-12-31 52 2006
364 2007-12-30 52 2007
416 2008-12-28 52 2008
468 2009-12-27 52 2009
521 2011-01-02 52 2011 # 52 -> 0
573 2012-01-01 52 2012 # 52 -> 0
625 2012-12-30 52 2012
677 2013-12-29 52 2013
729 2014-12-28 52 2014
781 2015-12-27 52 2015
834 2017-01-01 52 2017 # 52 -> 0
886 2017-12-31 52 2017
938 2018-12-30 52 2018
990 2019-12-29 52 2019
1042 2020-12-27 52 2020
# 52 -> 0
all_weeks_generated.loc[(all_weeks_generated['week_dt'].dt.month == 1)
& (all_weeks_generated['week'] == 52), 'week'] = 0