data = {'col_ts': ['2022-11-02T08:26:40', '2022-11-02T08:25:10', '2022-11-02T08:26:00', '2022-11-02T08:30:20',
'2022-11-02T08:33:30', '2022-11-02T08:36:40', '2022-11-02T08:26:20', '2022-11-02T08:50:10',
'2022-11-02T08:30:40', '2022-11-02T08:39:40']}
df = pd.DataFrame(data, columns = ['col_ts'])
df
I have a data set from that I would like to create two columns such as start_time and end_time, as shown below with 5 minutes Interval. Appreciate your help on this. In SQL, I have used the below code to produce the result.
time_slice(col_ts, 5, 'MINUTE', 'START') as START_INTERVAL,
time_slice(col_ts, 5, 'MINUTE', 'END') as END_INTERVAL,
In Pandas, I have used the below code. Unfortunately, that will give me a row-level interval.
df.resample("5T").mean()
CodePudding user response:
Here is one way to do it using Pandas to_datetime and dt.accessor:
df["col_ts"] = pd.to_datetime(df["col_ts"])
df["start_interval"] = df["col_ts"].dt.floor("5T")
df["end_interval"] = df["col_ts"].dt.ceil("5T")
Then:
col_ts start_interval end_interval
0 2022-11-02 08:26:40 2022-11-02 08:25:00 2022-11-02 08:30:00
1 2022-11-02 08:25:10 2022-11-02 08:25:00 2022-11-02 08:30:00
2 2022-11-02 08:26:00 2022-11-02 08:25:00 2022-11-02 08:30:00
3 2022-11-02 08:30:20 2022-11-02 08:30:00 2022-11-02 08:35:00
4 2022-11-02 08:33:30 2022-11-02 08:30:00 2022-11-02 08:35:00
5 2022-11-02 08:36:40 2022-11-02 08:35:00 2022-11-02 08:40:00
6 2022-11-02 08:26:20 2022-11-02 08:25:00 2022-11-02 08:30:00
7 2022-11-02 08:50:10 2022-11-02 08:50:00 2022-11-02 08:55:00
8 2022-11-02 08:30:40 2022-11-02 08:30:00 2022-11-02 08:35:00
9 2022-11-02 08:39:40 2022-11-02 08:35:00 2022-11-02 08:40:00