I have a following dataframe:
import pandas as pd
dict = {
"id": [1, 2, 2, 3, 3],
"start_time": [
"2022-08-30 08:00:02",
"2022-08-30 08:03:07",
"2022-08-30 08:06:52",
"2022-08-30 08:20:02",
"2022-08-30 08:20:45",
],
"end_time": [
"2022-08-30 08:00:02",
"2022-08-30 08:05:12",
"2022-08-30 08:06:52",
"2022-08-30 08:20:27",
"2022-08-30 08:22:27",
],
}
df = pd.DataFrame(dict)
I would like to count number of id
per 5 minutes intervals. Starting from 2022-08-30 08:00:01
ending in 2022-08-30 08:25:00
new_dict = {
"interval start": [
"2022-08-30 08:00:01",
"2022-08-30 08:05:01",
"2022-08-30 08:10:00",
"2022-08-30 08:15:01",
"2022-08-30 08:20:01",
],
"interval_end": [
"2022-08-30 08:05:00",
"2022-08-30 08:10:00",
"2022-08-30 08:15:00",
"2022-08-30 08:20:00",
"2022-08-30 08:25:00",
],
"count": [2, 2, 0, 0, 2],
}
new_df = pd.DataFrame(new_dict)
How can I do it please?
CodePudding user response:
If possible use input data from previous solution use:
dict_df = {
"id": [1, 2, 2, 2, 3, 3, 3, 3],
"time": [
"2022-08-30 08:00:02",
"2022-08-30 08:03:07",
"2022-08-30 08:05:12",
"2022-08-30 08:06:52",
"2022-08-30 08:00:02",
"2022-08-30 08:00:27",
"2022-08-30 08:20:45",
"2022-08-30 08:22:27",
],
}
df = pd.DataFrame(dict_df)
df['time'] = pd.to_datetime(df['time'])
df = df.groupby(pd.Grouper(freq='5Min', key='time'))['id'].nunique().reset_index(name='count')
df = df.rename(columns={'time':'interval start'})
df.insert(1, 'interval end', df['interval start'] pd.Timedelta('5Min'))
print (df)
interval start interval end count
0 2022-08-30 08:00:00 2022-08-30 08:05:00 3
1 2022-08-30 08:05:00 2022-08-30 08:10:00 1
2 2022-08-30 08:10:00 2022-08-30 08:15:00 0
3 2022-08-30 08:15:00 2022-08-30 08:20:00 0
4 2022-08-30 08:20:00 2022-08-30 08:25:00 1