Home > Software design >  How to Convert Pandas Week to User Supported Business Logic?
How to Convert Pandas Week to User Supported Business Logic?

Time:10-27

The goal is to convert a pandas timestamp object to week of the year according to the following user defined calendars for 2021 and 2022 respectively.

enter image description here

enter image description here

I am using the week property of pandas datetime, which worked for dates in 2021, but they are breaking down for the next year. This is the initial function I wrote.

def week(date: pandas.Timestamp) -> int:
    """Convert the date to a week according to client calendar."""
    orig_week: int = date.week % 53
    return orig_week   1 if date.dayofweek < 5 else orig_week   2

I added the modulo 53 since without it sometimes the rest of the logic gives me numbers like 54 or something. But I am not sure about the internal logic of the pandas week property, so cannot really get a grip on how to convert that week to the above user defined calendar, although it sounds like a simple shift. The problems are around the edge cases (end or beginning of year). So any help will be appreciated.

CodePudding user response:

One solution is to create a custom calendar for the input year. The dates in the year will be converted to a Period representing weeks that start on Saturday and end on Friday.

import pandas as pd

def week(in_date: pd.Timestamp) -> int:
    """Convert the date to a week according to client calendar."""
    # Setup a custom calendar table for the year of input date
    in_date_year = str(in_date.year)
    _df = pd.DataFrame({'Date':pd.date_range(in_date_year '-1-1', in_date_year '-12-31')})
    _df['Period'] = _df['Date'].dt.to_period('W-FRI')   # define week period that starts on SAT and ends on FRI 
    _df['Week_Num'] = _df['Period'].dt.week

    # Adjust week number for year start 
    _df['Week_Num'] = np.where(_df['Week_Num'].iloc[0] >= 52, _df['Week_Num'] % 53   1, _df['Week_Num'])
    # Adjust week number for year end
    _df.iloc[-7:, _df.columns.get_loc('Week_Num')] = np.where(_df['Week_Num'].iloc[-7:] < 52, 53, _df['Week_Num'].iloc[-7:])

    # Get week number and return
    return _df.loc[_df['Date'] == in_date, 'Week_Num'].iat[0]

The week numbers returned for the Period will be set according to weeks that start on SAT and end on FRI. However, for the year start and year end, the week number may still show week number corresponding to the previous/next year. Hence, we checked and adjusted for this year start/end discrepancy accordingly.

Result:

week(pd.Timestamp('2022-01-01'))
#output
1

week(pd.Timestamp('2022-12-31'))
#output
53

The underlying table is built as follows:

For year 2022:

# print first 10 rows of the year
print(_df.head(10))

        Date                 Period  Week_Num
0 2022-01-01  2022-01-01/2022-01-07         1
1 2022-01-02  2022-01-01/2022-01-07         1
2 2022-01-03  2022-01-01/2022-01-07         1
3 2022-01-04  2022-01-01/2022-01-07         1
4 2022-01-05  2022-01-01/2022-01-07         1
5 2022-01-06  2022-01-01/2022-01-07         1
6 2022-01-07  2022-01-01/2022-01-07         1
7 2022-01-08  2022-01-08/2022-01-14         2
8 2022-01-09  2022-01-08/2022-01-14         2
9 2022-01-10  2022-01-08/2022-01-14         2


# print last 10 rows of the year
print(_df.tail(10))

          Date                 Period  Week_Num
355 2022-12-22  2022-12-17/2022-12-23        51
356 2022-12-23  2022-12-17/2022-12-23        51
357 2022-12-24  2022-12-24/2022-12-30        52
358 2022-12-25  2022-12-24/2022-12-30        52
359 2022-12-26  2022-12-24/2022-12-30        52
360 2022-12-27  2022-12-24/2022-12-30        52
361 2022-12-28  2022-12-24/2022-12-30        52
362 2022-12-29  2022-12-24/2022-12-30        52
363 2022-12-30  2022-12-24/2022-12-30        52
364 2022-12-31  2022-12-31/2023-01-06        53

For year 2025:

# print first 10 rows of the year
print(_df.head(10))

        Date                 Period  Week_Num
0 2025-01-01  2024-12-28/2025-01-03         1
1 2025-01-02  2024-12-28/2025-01-03         1
2 2025-01-03  2024-12-28/2025-01-03         1
3 2025-01-04  2025-01-04/2025-01-10         2
4 2025-01-05  2025-01-04/2025-01-10         2
5 2025-01-06  2025-01-04/2025-01-10         2
6 2025-01-07  2025-01-04/2025-01-10         2
7 2025-01-08  2025-01-04/2025-01-10         2
8 2025-01-09  2025-01-04/2025-01-10         2
9 2025-01-10  2025-01-04/2025-01-10         2

# print last 10 rows of the year
print(_df.tail(10))

          Date                 Period  Week_Num
355 2025-12-22  2025-12-20/2025-12-26        52
356 2025-12-23  2025-12-20/2025-12-26        52
357 2025-12-24  2025-12-20/2025-12-26        52
358 2025-12-25  2025-12-20/2025-12-26        52
359 2025-12-26  2025-12-20/2025-12-26        52
360 2025-12-27  2025-12-27/2026-01-02        53
361 2025-12-28  2025-12-27/2026-01-02        53
362 2025-12-29  2025-12-27/2026-01-02        53
363 2025-12-30  2025-12-27/2026-01-02        53
364 2025-12-31  2025-12-27/2026-01-02        53
  • Related