I have a MySQL 8.0.23 table called events with the following schema:
eventUID - integer NOT NULL
camtimestamp - MySQL datetime stamp
direction - string - either "In" or "Out"
propUID - integer
In a single SELECT statement, I am trying to determine by hour for the last 24 hours how many cars are "In" and how many are "Out". Here is what I am trying (does not yet have the 24 hour limit built in).
select camtimestamp,count(*) from events where direction ="In" and propUID = 7 group by year(camtimestamp),month(camtimestamp),day(camtimestamp),hour(camtimestamp);
And this is a sample of what I am getting.
2022-02-14 22:02:40 38
2022-02-14 21:56:56 15
2022-02-14 20:55:30 47
2022-02-14 19:59:18 51
2022-02-14 18:59:50 36
2022-02-14 17:52:04 10
2022-02-14 16:58:01 16
2022-02-14 15:59:00 36
2022-02-14 14:58:52 44
I also have a table called datehourlist with which I can join in my SELECT.
Sample data:
2019-05-01 00:00:00
2019-05-01 01:00:00
2019-05-01 02:00:00
2019-05-01 03:00:00
2019-05-01 04:00:00
2019-05-01 05:00:00
2019-05-01 06:00:00
2019-05-01 07:00:00
2019-05-01 08:00:00
Also:
mysql> select min(datehour) from datehourlist;
---------------------
| min(datehour) |
---------------------
| 2019-05-01 00:00:00 |
---------------------
1 row in set (0.02 sec)
mysql> select max(datehour) from datehourlist;
---------------------
| max(datehour) |
---------------------
| 2040-12-31 00:00:00 |
---------------------
1 row in set (0.02 sec)
datehourlist has every hour in it from May 1, 2019 until December 31, 2040.
This is a sample of what I really want from this:
Column 1 below is a rounded grouped timestamp (vs col 1 above being a non-rounded, actual timestamp)
Column 1 below does not skip an hour if there is no data from that hour.
Column 2 below is the "In" count for that hour.
Column 3 below is the "Out" count for that hour.
2019-05-02 06:00:00 5 10
2019-05-02 07:00:00 127 10
2019-05-02 08:00:00 0 0
2019-05-02 09:00:00 115 10
2019-05-02 10:00:00 71 10
2019-05-02 11:00:00 147 10
2019-05-02 12:00:00 140 10
What SELECT statement should I use to get the output I need?
Also, how would I optimize that SELECT statement?
Within events, I have 500k events and growing by 100s everyday.
Thank you in advance for your help.
Thank you for a great solution so quickly.
SELECT dhl.datehour datehour,
COALESCE(SUM(ev.direction = 'In'), 0) `In`,
COALESCE(SUM(ev.direction = 'Out'), 0) `Out`
FROM datehourlist dhl
LEFT JOIN events ev
ON DATE_FORMAT(ev.camtimestamp, '%Y-%m-%d %H:00:00') = dhl.datehour
WHERE ev.camtimestamp >= DATE_FORMAT(NOW(), '%Y-%m-%d %H:00:00') - INTERVAL 24 HOUR
AND ev.camtimestamp < DATE_FORMAT(NOW(), '%Y-%m-%d %H:00:00')
AND ev.propUID = 7
GROUP BY dhl.datehour;
CodePudding user response:
First, we need a trunc_to_hour() function that takes an arbitrary DATETIME
or TIMESTAMP
value and gives back the beginning of its hour. That is this.
DATE_FORMAT(camtimestamp, '%Y-%m-%d %H:00:00')
Second, we need a WHERE expression that can handle the most recent 24 hours. That is this.
WHERE camtimestamp >= DATE_FORMAT(NOW(), '%Y-%m-%d %H:00:00') - INTERVAL 24 HOUR
AND camtimestamp < DATE_FORMAT(NOW(), '%Y-%m-%d %H:00:00')
For the example timestamp 2021-03-14 16:04:30
this gives the following.
WHERE camtimestamp >= `2021-03-13 16:00:00`
AND camtimestamp < '2021-03-14 16:00:00`
That is, it chooses records for the most recent 24 complete clock hours. You may have to adjust this WHERE expression if you want the hours to date.
Third, we need conditional sums (for In and Out).
It happens that the expression direction = 'In'
gives 1
when direction
is In
, 0
when direction
is some other string (like Out
), and NULL if direction
itself is NULL. So
SUM(direction='In')
counts the rows meeting that criterion.
Fourth, when the SUM is null, we want to show zero. Like this.
COALESCE(SUM(direction='In'),0)
Fifth, we can put it together something like this:
SELECT DATE_FORMAT(ev.camtimestamp, '%Y-%m-%d %H:00:00') datehour,
COALESCE(SUM(ev.direction = 'In'), 0) `In`,
COALESCE(SUM(ev.direction = 'Out'), 0) `Out`
FROM events ev
WHERE ev.camtimestamp >= DATE_FORMAT(NOW(), '%Y-%m-%d %H:00:00') - INTERVAL 24 HOUR
AND ev.camtimestamp < DATE_FORMAT(NOW(), '%Y-%m-%d %H:00:00')
AND ev.propUID = 7
GROUP BY DATE_FORMAT(ev.camtimestamp, '%Y-%m-%d %H:00:00')
That gives you your result set. But it still might be missing some hours if there are no records for those hours.
So, sixth, we can join that to your pre-existing hourly calendar table like this:
SELECT dhl.datehour datehour,
COALESCE(SUM(ev.direction = 'In'), 0) `In`,
COALESCE(SUM(ev.direction = 'Out'), 0) `Out`
FROM datehourlist dhl
LEFT JOIN events ev
ON DATE_FORMAT(ev.camtimestamp, '%Y-%m-%d %H:00:00') = dhl.datehour
WHERE dhl.datehour >= `2021-03-13 16:00:00`
AND dhl.datehour < '2021-03-14 16:00:00`
AND ev.propUID = 7
GROUP BY dhl.datehour
And that should do it. (Not debugged.)