Home > Mobile >  Pandas - Alter week number based on the day of the week derived from the date column
Pandas - Alter week number based on the day of the week derived from the date column

Time:01-21

Suppose I have the following dataframe.

Date Week_Num WeekDay
01/01/23 1 Sunday
02/01/23 1 Monday
04/01/23 1 Wednesday
05/01/23 1 Thursday
07/01/23 1 Saturday

I understand that the third row is in the first week, however I want to use Wednesday as a cut off point as by this point half the week has passed so the following days should move to the next week, such as below.

Date Week_Num WeekDay
01/01/23 1 Sunday
02/01/23 1 Monday
04/01/23 1 Wednesday
05/01/23 2 Thursday
07/01/23 2 Saturday

My attempts so far have been sporadic in their success, this is also somewhat of a edge case but one set of data seems to be prone to these sort of oddities so I wanted a solution.

The idea I have in my head is to use something like:

if Weekday-Number > 3 then Week_Num   1
else do nothing

I understand how to do each part separately, but bringing them together is where I get stuck. Any help would be greatly appreciated.

CodePudding user response:

"Shortly" (one-liner), you can use cumsum():

df['Week_Num'] = df['WeekDay'].eq('Wednesday').cumsum().add(1).shift(1).fillna(1).astype(int)

Example input:

df = pd.DataFrame.from_dict({
    'Date': ['01/01/23', '02/01/23', '04/01/23', '05/01/23', '07/01/23', '07/01/23', '07/01/23', '07/01/23', '07/01/23', '07/01/23'],
    'WeekDay': ['Sunday', 'Monday', 'Wednesday', 'Thursday', 'Saturday', 'Wednesday', 'Thursday', 'Saturday', 'Wednesday', 'Thursday']
})

Outputs:

       Date    WeekDay  Week_Num
0  01/01/23     Sunday         1
1  02/01/23     Monday         1
2  04/01/23  Wednesday         1
3  05/01/23   Thursday         2
4  07/01/23   Saturday         2
5  07/01/23  Wednesday         2
6  07/01/23   Thursday         3
7  07/01/23   Saturday         3
8  07/01/23  Wednesday         3
9  07/01/23   Thursday         4


Explanations:

  1. We introduce a new column Week_Num to df
  2. We use cumsum() which accumulates a value when we meet a specific requirement notated by .eq
  3. The .eq part is if we meet a WeekDay 'Wednesday'
  4. We add one to each row because cumsum starts from 0 and we want to start from 1.
  5. We shift downwards each row in the dataframe by 1 to change only the rows not including 'Wednesday'
  6. Because we shifted each row downwards then the first row will have NaN - so we fill it with the week_num of 1
  7. We convert the row to have integer values with the astype(int)

CodePudding user response:

You can use pandas Timedelta objects.

Using just your Date column, you can convert it to a pandas datetime object (and in fact use it to create your WeekDay column).

>>> import pandas as pd
>>> df = pd.DataFrame( 
    data = {"Date":["1/1/23", "2/1/23", "4/1/23", "5/1/23", "7/1/23" ]}
)
>>> df.Date = pd.to_datetime( df.Date, dayfirst=True )
>>> df

    Date
0   2023-01-02
1   2023-01-03
2   2023-01-04
3   2023-01-05

Create the WeekDay column:

>>> dayOfWeekMap = { 0: "Monday", 1: "Tuesday", 2: "Wednesday", 3: "Thursday", 4: "Friday",
                5: "Saturday", 6: "Sunday" }
>>> df["WeekDay"] = df.Date.dt.dayofweek.map( dayOfWeekMap )
>>> df

    Date    WeekDay
0   2023-01-01  Sunday
1   2023-01-02  Monday
2   2023-01-04  Wednesday
3   2023-01-05  Thursday
4   2023-01-07  Saturday

Finally, get the week number with your custom cutoff. First, define the day you want to start. For your test data, I would start at Wednesday 28/12/2023 so that 3/1/2023 is has Week_Num = 1, but 4/1/2023 has Week_Num = 2.

>>> start_date = pd.to_datetime( "28/12/2022", dayfirst=True )
>>> df["Week_Num"] = ( ( df.Date - start_date ).dt.days // 7 ).astype( int )   1
>>> df

    Date        WeekDay Week_Num
0   2023-01-01  Sunday      1
1   2023-01-02  Monday      1
2   2023-01-04  Wednesday   2
3   2023-01-05  Thursday    2
4   2023-01-07  Saturday    2

What's happening here: We take the difference in the number of days from the date in the observation from the start date, floor division by 7 (so 6 days after start date is 0, 7 days from start date is 1, 8 days from start date is 1), and then add 1 so that our counter starts at 1.

CodePudding user response:

There is no simple, non-iterative solution using the current data frame. If the WeekDay column was expressed as a number rather than text, a simple df.loc[] statement would give you the desired result.

df.loc[df["WeekDay"] > 4, "Week_Num"]  = 1

This is saying to locate the rows where the Weekday > 4 and increment the Week_Num value by 1.

Using the data frame you have posted, a slower, iterative solution (not recommended) can be used along with the weekday() function from the datetime library.

#loop through each data frame row
for i, row in df.iterrows():
    #if the date is past wednesday, increment week_num
    if(row["dates"].weekday() > 4):
        df.at[i, "week_num"]  = 1                                                    

The code iterates through each row item and increments the Week_Num based on an if statement. Using this method also means the WeekDay column is obsolete since datetime allows you to get the weekday number from the date.

  • Related