Create 15minute time intervals from given time in a column and find count of values in each time int-CodePudding

Given this dataset that contains the datetime for an event and the datetime of ticket scan

event_name	event_datetime	scan_datetime
Game 1	2018-10-17 19:30:00	2018-10-17 20:01:20 2018-10-17 19:20:10 2018-10-17 21:44:43 2018-10-17 20:30:46 2018-10-17 19:51:56
...	...	...
Game 2	2019-04-10 19:30:00	2019-04-10 19:39:35 2019-04-10 30:30:49 2019-04-10 20:41:10 2019-04-10 19:46:20 2019-04-10 22:24:19

And the desired output should be a column with every 15 minute time intervals before and after event_datetime with values of scan_datetime

Time_intervals	90-75	75-60	60-45	45-30	30-15	15-0	0-15	15-30	30-45	45-60
count	2	1	5	6	4	3	25	7	4	1

CodePudding user response：

First, you need to extract datetime from scan_datetime column then explode it. Next, compute the time delta between event_datetime and scan_datetime columns. Finally, bin values and count number of occurrences.

DT = r'\d{4}-\d{2}-\d{2} \d{2}:\d{2}:\d{2}'
df['scan_datetime'] = df['scan_datetime'].str.findall(DT)

df = df.explode('scan_datetime').astype({'event_datetime': 'datetime64', 
                                         'scan_datetime': 'datetime64'})

df['diff'] = df['event_datetime'].sub(df['scan_datetime']).dt.total_seconds().div(60)

bins = np.arange(24*-15, 24*15 1, 15)
labels = [f'{abs(i)}-{abs(j)}' for i, j in zip(bins, bins[1:])]

out = pd.cut(df['diff'], bins=bins, labels=labels) \
        .value_counts(sort=False).to_frame('count').T

Output:

>>> out
       180-165  165-150  150-135  135-120  120-105  105-90  90-75  75-60  60-45  45-30  ...  30-45  45-60  60-75  75-90  90-105  105-120  120-135  135-150  150-165  165-180
count        1        0        0        1        0       0      0      3      0      1  ...      0      0      0      0       0        0        0        0        0        0

[1 rows x 24 columns]

CodePudding user response：

First you explode the list in scan_datetime.

df = df.explode("scan_datetime").reset_index(drop=True)

This will make each scan a separate row. I am assuming the values in the list are strings, so we convert them to datetime:

df["scan_datetime"] = pd.to_datetime(df["scan_datetime"])

Than you calculate the difference:

df["diff"] = df["event_datetime"] - df["scan_datetime"]
df["diff"] = (df["diff"].dt.total_seconds()/60).astype(int)

Now you differences in minutes(signed integers). Than you run value counts with the specified intervals.

df["diff"].value_counts(bins=[-90,-75,-60, -45, -30, -15, 0, 15, 30, 45, 60])