Home > Software engineering >  Pandas - Convert datetime column to start of week date (Sunday)
Pandas - Convert datetime column to start of week date (Sunday)

Time:10-04

I've been trying a handful of methods here and can't seem to get it right. What I want to do is look at a datetime stamp column, and create a new column that has the Sunday start of that week.

It MOSTLY works, except if the datetime stamp is on a Sunday, then instead of giving that Sunday as the start of the week, it refers back to the previous Sunday.

Sample dataset:

import pandas as pd

data = [
['0',   '2022-05-22 00:25:13'],
['1',   '2022-05-25 14:59:50'],
['2',   '2022-05-28 17:32:37'],
['3',   '2022-08-15 11:07:52'],
['4',   '2022-04-08 16:52:39']]

columns = ['id', 'event_timestamp_et']

df = pd.DataFrame(data, columns=columns)
df['event_timestamp_et'] = pd.to_datetime(df['event_timestamp_et'])

Expected output:

    id  event_timestamp_et  startWeek
0   0   2022-05-22 00:25:13 2022-05-22
1   1   2022-05-25 14:59:50 2022-05-22
2   2   2022-05-28 17:32:37 2022-05-22
3   3   2022-08-15 11:07:52 2022-08-14
4   4   2022-04-08 16:52:39 2022-04-03

Here is what I have tried (I've included a few methods), as you can see, not quite working:

df['startWeek'] = df['event_timestamp_et'] - pd.offsets.Week(weekday=6)
df['startWeek'] = df['startWeek'].dt.normalize()

df['startWeek1'] = df['event_timestamp_et'] - pd.to_timedelta(df['event_timestamp_et'].dt.dayofweek, unit='d')
df['startWeek2'] = df['event_timestamp_et'].dt.to_period('W-SUN').apply(lambda r: r.start_time)
df["startWeek3"] = df['event_timestamp_et'].dt.to_period('W').dt.start_time   pd.Timedelta(6, unit='d')

Actual output:

    id  event_timestamp_et  startWeek   startWeek1  startWeek2  startWeek3
0   0   2022-05-22 00:25:13 2022-05-15  2022-05-16 00:25:13 2022-05-16  2022-05-22
1   1   2022-05-25 14:59:50 2022-05-22  2022-05-23 14:59:50 2022-05-23  2022-05-29
2   2   2022-05-28 17:32:37 2022-05-22  2022-05-23 17:32:37 2022-05-23  2022-05-29
3   3   2022-08-15 11:07:52 2022-08-14  2022-08-15 11:07:52 2022-08-15  2022-08-21
4   4   2022-04-08 16:52:39 2022-04-03  2022-04-04 16:52:39 2022-04-04  2022-04-10

CodePudding user response:

One way using W-SAT not W-SUN (which is same as W btw)

df["event_timestamp_et"].dt.to_period("W-SAT").dt.start_time

Output:

0   2022-05-22
1   2022-05-22
2   2022-05-22
3   2022-08-14
4   2022-04-03
Name: event_timestamp_et, dtype: datetime64[ns]

Validation:

s = pd.Series(["2022-05-21 00:25:13",
               "2022-05-22 00:25:13",
               "2022-05-23 00:25:13",
               "2022-05-24 00:25:13",
               "2022-05-25 00:25:13",
               "2022-05-26 00:25:13",
               "2022-05-27 00:25:13",
               "2022-05-28 00:25:13",
               "2022-05-29 00:25:13",
               "2022-05-30 00:25:13",])
df = pd.to_datetime(s).to_frame("date")
df["name"] = df["date"].dt.day_name()
df["start"] = df["date"].dt.to_period("W-SAT").dt.start_time
df["start_day"] = df["start"].dt.day_name()

Output:

                 date       name      start start_day
0 2022-05-21 00:25:13   Saturday 2022-05-15    Sunday
1 2022-05-22 00:25:13     Sunday 2022-05-22    Sunday
2 2022-05-23 00:25:13     Monday 2022-05-22    Sunday
3 2022-05-24 00:25:13    Tuesday 2022-05-22    Sunday
4 2022-05-25 00:25:13  Wednesday 2022-05-22    Sunday
5 2022-05-26 00:25:13   Thursday 2022-05-22    Sunday
6 2022-05-27 00:25:13     Friday 2022-05-22    Sunday
7 2022-05-28 00:25:13   Saturday 2022-05-22    Sunday
8 2022-05-29 00:25:13     Sunday 2022-05-29    Sunday
9 2022-05-30 00:25:13     Monday 2022-05-29    Sunday

CodePudding user response:

You can convert to weekly period ending on Saturdays:

df['event_timestamp_et'].dt.to_period('W-SAT').dt.start_time

Output:

0   2022-05-22
1   2022-05-22
2   2022-05-22
3   2022-08-14
4   2022-04-03
Name: event_timestamp_et, dtype: datetime64[ns]

CodePudding user response:

A solution with pandas.tseries.offsets.Week :

col_date = df['event_timestamp_et']

#Is the date concides with the start of the week (Sunday) ?
mask = col_date.ne(col_date - pd.offsets.Week(weekday=6)   pd.offsets.Week())

df['startWeek'] = col_date.mask(mask, col_date - pd.offsets.Week(weekday=6)).dt.date

# Output :

print(df)

  id  event_timestamp_et   startWeek
0  0 2022-05-22 00:25:13  2022-05-22
1  1 2022-05-25 14:59:50  2022-05-22
2  2 2022-05-28 17:32:37  2022-05-22
3  3 2022-08-15 11:07:52  2022-08-14
4  4 2022-04-08 16:52:39  2022-04-03

CodePudding user response:

One method I tried that work was to use numpy to find the "Sundays". If it's a "Sunday", just use that date, otherwise, get previous Sunday. Is there a better method?

import pandas as pd
import numpy as np

data = [
['0',   '2022-05-22 00:25:13'],
['1',   '2022-05-25 14:59:50'],
['2',   '2022-05-28 17:32:37'],
['3',   '2022-08-15 11:07:52'],
['4',   '2022-04-08 16:52:39']]

columns = ['id', 'event_timestamp_et']

df = pd.DataFrame(data, columns=columns)
df['event_timestamp_et'] = pd.to_datetime(df['event_timestamp_et'])


df['startWeek'] = np.where(df['event_timestamp_et'].dt.day_name() == 'Sunday', df['event_timestamp_et'].dt.normalize(), df['event_timestamp_et'] - pd.offsets.Week(weekday=6))
df['startWeek'] = pd.to_datetime(df['startWeek'])
df['startWeek'] = df['startWeek'].dt.normalize()
  • Related