I've been trying a handful of methods here and can't seem to get it right. What I want to do is look at a datetime stamp column, and create a new column that has the Sunday start of that week.
It MOSTLY works, except if the datetime stamp is on a Sunday, then instead of giving that Sunday as the start of the week, it refers back to the previous Sunday.
Sample dataset:
import pandas as pd
data = [
['0', '2022-05-22 00:25:13'],
['1', '2022-05-25 14:59:50'],
['2', '2022-05-28 17:32:37'],
['3', '2022-08-15 11:07:52'],
['4', '2022-04-08 16:52:39']]
columns = ['id', 'event_timestamp_et']
df = pd.DataFrame(data, columns=columns)
df['event_timestamp_et'] = pd.to_datetime(df['event_timestamp_et'])
Expected output:
id event_timestamp_et startWeek
0 0 2022-05-22 00:25:13 2022-05-22
1 1 2022-05-25 14:59:50 2022-05-22
2 2 2022-05-28 17:32:37 2022-05-22
3 3 2022-08-15 11:07:52 2022-08-14
4 4 2022-04-08 16:52:39 2022-04-03
Here is what I have tried (I've included a few methods), as you can see, not quite working:
df['startWeek'] = df['event_timestamp_et'] - pd.offsets.Week(weekday=6)
df['startWeek'] = df['startWeek'].dt.normalize()
df['startWeek1'] = df['event_timestamp_et'] - pd.to_timedelta(df['event_timestamp_et'].dt.dayofweek, unit='d')
df['startWeek2'] = df['event_timestamp_et'].dt.to_period('W-SUN').apply(lambda r: r.start_time)
df["startWeek3"] = df['event_timestamp_et'].dt.to_period('W').dt.start_time pd.Timedelta(6, unit='d')
Actual output:
id event_timestamp_et startWeek startWeek1 startWeek2 startWeek3
0 0 2022-05-22 00:25:13 2022-05-15 2022-05-16 00:25:13 2022-05-16 2022-05-22
1 1 2022-05-25 14:59:50 2022-05-22 2022-05-23 14:59:50 2022-05-23 2022-05-29
2 2 2022-05-28 17:32:37 2022-05-22 2022-05-23 17:32:37 2022-05-23 2022-05-29
3 3 2022-08-15 11:07:52 2022-08-14 2022-08-15 11:07:52 2022-08-15 2022-08-21
4 4 2022-04-08 16:52:39 2022-04-03 2022-04-04 16:52:39 2022-04-04 2022-04-10
CodePudding user response:
One way using W-SAT
not W-SUN
(which is same as W
btw)
df["event_timestamp_et"].dt.to_period("W-SAT").dt.start_time
Output:
0 2022-05-22
1 2022-05-22
2 2022-05-22
3 2022-08-14
4 2022-04-03
Name: event_timestamp_et, dtype: datetime64[ns]
Validation:
s = pd.Series(["2022-05-21 00:25:13",
"2022-05-22 00:25:13",
"2022-05-23 00:25:13",
"2022-05-24 00:25:13",
"2022-05-25 00:25:13",
"2022-05-26 00:25:13",
"2022-05-27 00:25:13",
"2022-05-28 00:25:13",
"2022-05-29 00:25:13",
"2022-05-30 00:25:13",])
df = pd.to_datetime(s).to_frame("date")
df["name"] = df["date"].dt.day_name()
df["start"] = df["date"].dt.to_period("W-SAT").dt.start_time
df["start_day"] = df["start"].dt.day_name()
Output:
date name start start_day
0 2022-05-21 00:25:13 Saturday 2022-05-15 Sunday
1 2022-05-22 00:25:13 Sunday 2022-05-22 Sunday
2 2022-05-23 00:25:13 Monday 2022-05-22 Sunday
3 2022-05-24 00:25:13 Tuesday 2022-05-22 Sunday
4 2022-05-25 00:25:13 Wednesday 2022-05-22 Sunday
5 2022-05-26 00:25:13 Thursday 2022-05-22 Sunday
6 2022-05-27 00:25:13 Friday 2022-05-22 Sunday
7 2022-05-28 00:25:13 Saturday 2022-05-22 Sunday
8 2022-05-29 00:25:13 Sunday 2022-05-29 Sunday
9 2022-05-30 00:25:13 Monday 2022-05-29 Sunday
CodePudding user response:
You can convert to weekly period ending on Saturdays:
df['event_timestamp_et'].dt.to_period('W-SAT').dt.start_time
Output:
0 2022-05-22
1 2022-05-22
2 2022-05-22
3 2022-08-14
4 2022-04-03
Name: event_timestamp_et, dtype: datetime64[ns]
CodePudding user response:
A solution with pandas.tseries.offsets.Week
:
col_date = df['event_timestamp_et']
#Is the date concides with the start of the week (Sunday) ?
mask = col_date.ne(col_date - pd.offsets.Week(weekday=6) pd.offsets.Week())
df['startWeek'] = col_date.mask(mask, col_date - pd.offsets.Week(weekday=6)).dt.date
# Output :
print(df)
id event_timestamp_et startWeek
0 0 2022-05-22 00:25:13 2022-05-22
1 1 2022-05-25 14:59:50 2022-05-22
2 2 2022-05-28 17:32:37 2022-05-22
3 3 2022-08-15 11:07:52 2022-08-14
4 4 2022-04-08 16:52:39 2022-04-03
CodePudding user response:
One method I tried that work was to use numpy to find the "Sundays". If it's a "Sunday", just use that date, otherwise, get previous Sunday. Is there a better method?
import pandas as pd
import numpy as np
data = [
['0', '2022-05-22 00:25:13'],
['1', '2022-05-25 14:59:50'],
['2', '2022-05-28 17:32:37'],
['3', '2022-08-15 11:07:52'],
['4', '2022-04-08 16:52:39']]
columns = ['id', 'event_timestamp_et']
df = pd.DataFrame(data, columns=columns)
df['event_timestamp_et'] = pd.to_datetime(df['event_timestamp_et'])
df['startWeek'] = np.where(df['event_timestamp_et'].dt.day_name() == 'Sunday', df['event_timestamp_et'].dt.normalize(), df['event_timestamp_et'] - pd.offsets.Week(weekday=6))
df['startWeek'] = pd.to_datetime(df['startWeek'])
df['startWeek'] = df['startWeek'].dt.normalize()