Suppose I have the following dataframe.
Date | Week_Num | WeekDay |
---|---|---|
01/01/23 | 1 | Sunday |
02/01/23 | 1 | Monday |
04/01/23 | 1 | Wednesday |
05/01/23 | 1 | Thursday |
07/01/23 | 1 | Saturday |
I understand that the third row is in the first week, however I want to use Wednesday as a cut off point as by this point half the week has passed so the following days should move to the next week, such as below.
Date | Week_Num | WeekDay |
---|---|---|
01/01/23 | 1 | Sunday |
02/01/23 | 1 | Monday |
04/01/23 | 1 | Wednesday |
05/01/23 | 2 | Thursday |
07/01/23 | 2 | Saturday |
My attempts so far have been sporadic in their success, this is also somewhat of a edge case but one set of data seems to be prone to these sort of oddities so I wanted a solution.
The idea I have in my head is to use something like:
if Weekday-Number > 3 then Week_Num 1
else do nothing
I understand how to do each part separately, but bringing them together is where I get stuck. Any help would be greatly appreciated.
CodePudding user response:
"Shortly" (one-liner), you can use cumsum()
:
df['Week_Num'] = df['WeekDay'].eq('Wednesday').cumsum().add(1).shift(1).fillna(1).astype(int)
Example input:
df = pd.DataFrame.from_dict({
'Date': ['01/01/23', '02/01/23', '04/01/23', '05/01/23', '07/01/23', '07/01/23', '07/01/23', '07/01/23', '07/01/23', '07/01/23'],
'WeekDay': ['Sunday', 'Monday', 'Wednesday', 'Thursday', 'Saturday', 'Wednesday', 'Thursday', 'Saturday', 'Wednesday', 'Thursday']
})
Outputs:
Date WeekDay Week_Num
0 01/01/23 Sunday 1
1 02/01/23 Monday 1
2 04/01/23 Wednesday 1
3 05/01/23 Thursday 2
4 07/01/23 Saturday 2
5 07/01/23 Wednesday 2
6 07/01/23 Thursday 3
7 07/01/23 Saturday 3
8 07/01/23 Wednesday 3
9 07/01/23 Thursday 4
Explanations:
- We introduce a new column
Week_Num
todf
- We use
cumsum()
which accumulates a value when we meet a specific requirement notated by.eq
- The
.eq
part is if we meet a WeekDay 'Wednesday' - We add one to each row because
cumsum
starts from 0 and we want to start from 1. - We shift downwards each row in the dataframe by 1 to change only the rows not including 'Wednesday'
- Because we shifted each row downwards then the first row will have
NaN
- so we fill it with the week_num of1
- We convert the row to have integer values with the
astype(int)
CodePudding user response:
You can use pandas Timedelta objects.
Using just your Date
column, you can convert it to a pandas datetime object (and in fact use it to create your WeekDay
column).
>>> import pandas as pd
>>> df = pd.DataFrame(
data = {"Date":["1/1/23", "2/1/23", "4/1/23", "5/1/23", "7/1/23" ]}
)
>>> df.Date = pd.to_datetime( df.Date, dayfirst=True )
>>> df
Date
0 2023-01-02
1 2023-01-03
2 2023-01-04
3 2023-01-05
Create the WeekDay
column:
>>> dayOfWeekMap = { 0: "Monday", 1: "Tuesday", 2: "Wednesday", 3: "Thursday", 4: "Friday",
5: "Saturday", 6: "Sunday" }
>>> df["WeekDay"] = df.Date.dt.dayofweek.map( dayOfWeekMap )
>>> df
Date WeekDay
0 2023-01-01 Sunday
1 2023-01-02 Monday
2 2023-01-04 Wednesday
3 2023-01-05 Thursday
4 2023-01-07 Saturday
Finally, get the week number with your custom cutoff. First, define the day you want to start. For your test data, I would start at Wednesday 28/12/2023 so that 3/1/2023 is has Week_Num = 1
, but 4/1/2023 has Week_Num = 2
.
>>> start_date = pd.to_datetime( "28/12/2022", dayfirst=True )
>>> df["Week_Num"] = ( ( df.Date - start_date ).dt.days // 7 ).astype( int ) 1
>>> df
Date WeekDay Week_Num
0 2023-01-01 Sunday 1
1 2023-01-02 Monday 1
2 2023-01-04 Wednesday 2
3 2023-01-05 Thursday 2
4 2023-01-07 Saturday 2
What's happening here: We take the difference in the number of days from the date in the observation from the start date, floor division by 7 (so 6 days after start date is 0, 7 days from start date is 1, 8 days from start date is 1), and then add 1 so that our counter starts at 1.
CodePudding user response:
There is no simple, non-iterative solution using the current data frame. If the WeekDay column was expressed as a number rather than text, a simple df.loc[] statement would give you the desired result.
df.loc[df["WeekDay"] > 4, "Week_Num"] = 1
This is saying to locate the rows where the Weekday > 4 and increment the Week_Num value by 1.
Using the data frame you have posted, a slower, iterative solution (not recommended) can be used along with the weekday() function from the datetime library.
#loop through each data frame row
for i, row in df.iterrows():
#if the date is past wednesday, increment week_num
if(row["dates"].weekday() > 4):
df.at[i, "week_num"] = 1
The code iterates through each row item and increments the Week_Num based on an if statement. Using this method also means the WeekDay column is obsolete since datetime allows you to get the weekday number from the date.