Home > other >  SparkSQL2.0 Query to count number of requests every 15 minutes within past hour
SparkSQL2.0 Query to count number of requests every 15 minutes within past hour

Time:11-19

I'm trying to write a SQL query to count the number of requests received every 15 minutes within the past hour. I want to return the last three 15 minute intervals.

My data looks like:

log_date, time
2022-11-18, 2022-11-18 17:00:00 UTC
2022-11-18, 2022-11-18 17:10:00 UTC
2022-11-18, 2022-11-18 17:20:00 UTC
2022-11-18, 2022-11-18 17:40:00 UTC

My query so far:

select log_date, FLOOR(unix_timestamp(time)) div (15 * 60) as intervals, count(*)
from data_log
where log_date >= date_sub(from_unixtime(unix_timestamp(), 'yyyy-MM-dd'), 1)
and requests_recieved = true
group by log_date, time
order by intervals desc limit 3

I want it to look like:

log_date, time, count
2022-11-18, 2022-11-18 17:00:00 UTC, 2
2022-11-18, 2022-11-18 17:15:00 UTC, 1
2022-11-18, 2022-11-18 17:30:00 UTC, 1

Update: I'm using SparkSQL 2.3.1

CodePudding user response:

Convert to time every 15 minutes with the TIMESTAMPADD function.

SELECT * FROM (
  SELECT
    log_date,
    TIMESTAMPADD(
      MINUTE,
      hour(time) * 60   (minute(time) div 15 * 15),
      date(time)) time15m,
    COUNT(*) count
  FROM t1
  GROUP BY log_date, time15m
  ORDER BY time15m DESC
  LIMIT 3
) s
ORDER BY time15m;

CodePudding user response:

This is mostly about date/time fuctions:

  1. convert log_time to 15-minute interval by
    • truncate to nearest hour with date_format()
    • add 0 to 3 15-minute intervals with date_add()
  2. Follow the same idea to calculate the last 15-minute interval in the past hour as start interval.

Step 1. create data_log table and prep test data

create table data_log (
    log_date            date,
    log_time            timestamp,
    requests_received   boolean);

insert into data_log
values
('2022-11-18', '2022-11-18 17:55:00', true),
('2022-11-18', '2022-11-18 18:00:00', true),
('2022-11-18', '2022-11-18 18:10:00', true),
('2022-11-18', '2022-11-18 18:20:00', true),
('2022-11-18', '2022-11-18 18:40:00', true),
('2022-11-18', '2022-11-18 18:50:00', true);

Step 2. Query log entries in the past hour and aggregate to 15-minute interval:

with an_hour_ago as (
select date_add(current_timestamp, interval -1 hour)  as an_hour_ago),
interval_an_hour_ago as (
select date_add(date_format(an_hour_ago, '%y-%m-%d %H:00:00'), interval 15 * floor(date_format(an_hour_ago, '%i') / 15) minute) as start_interval
  from an_hour_ago)
select log_date,
       date_add(date_format(log_time,'%y-%m-%d %H:00:00'), interval 15 * floor(date_format(log_time,'%i') / 15) minute ) as log_interval,
       count(*)  as count
  from data_log d,
       interval_an_hour_ago i  
 where d.requests_received = true
   and d.log_time >= i.start_interval
 group by 1, 2;

Result:

log_date  |log_interval       |count|
---------- ------------------- ----- 
2022-11-18|2022-11-18 18:00:00|    2|
2022-11-18|2022-11-18 18:15:00|    1|
2022-11-18|2022-11-18 18:30:00|    1|
2022-11-18|2022-11-18 18:45:00|    1|

CodePudding user response:

This query compatible with all MySQL versions.

SET time_zone = 'UTC';
SET @i=0;
SET @t_now = unix_timestamp();  /*Request time*/
SET @hr = 3600 * (@t_now div 3600);  /*Request hour*/

SELECT log_date, from_unixtime(qr * 15 * 60   @hr) AS `time`,  COUNT(qr) AS `count` FROM (
  
  SELECT 
    log_date, 
    @i:=((@t_now - unix_timestamp(`time`)) div 60)  AS m, /* minutes elapsed */
    FLOOR(@i/15) as qr /*hour quarter*/
  FROM data_log WHERE requests_recieved = true     
  
) AS t1
WHERE m <= 60 /* Last hour only */
GROUP BY log_date, qr /* Group by each hour quarter */
ORDER BY log_date, qr
;

sample of results from my test data:

log_date    |time                 |count
------------ --------------------- -----
2022-11-19  | 2022-11-19 03:30:00 | 3
2022-11-19  | 2022-11-19 03:45:00 | 4

  • Related