I have a table net_score
that looks like below (only a sample provided)
date | ip | up_score
---------------------------- ----------------- ----------
2022-09-09 07:30:04.485979 | 12.22.19.0 | 51
2022-09-09 07:30:04.485979 | 10.22.39.1 | 95
2022-09-09 07:30:04.485979 | 14.260.13.1 | 100
2022-09-09 07:30:04.485979 | 252.229.219.43 | 97
2022-09-09 07:30:04.485979 | 10.551.343.10 | 97
2022-09-09 08:30:04.485979 | 12.22.19.0 | 11
2022-09-09 08:30:04.485979 | 10.22.39.1 | 54
2022-09-09 08:30:04.485979 | 14.260.13.1 | 89
2022-09-09 08:30:04.485979 | 252.229.219.43 | 37
2022-09-09 08:30:04.485979 | 10.551.343.10 | 11
2022-09-09 09:30:04.485979 | 12.22.19.0 | 54
2022-09-09 09:30:04.485979 | 10.22.39.1 | 15
2022-09-09 09:30:04.485979 | 14.260.13.1 | 90
2022-09-09 09:30:04.485979 | 252.229.219.43 | 17
2022-09-09 09:30:04.485979 | 10.551.343.10 | 50
As you can see I have a date
, ip
and up_score
column and the data is calculated at every hour.
What I want is a way to get hourly data for 24 hours as 24 separate columns per ip
and incase the data doesn't exists, simply put -1 for that column.
I can get hourly data by doing something like below
select date_trunc('hour', date) as hourly, ip, up_score
from net_score
where date between '2022-09-09 05:30:00' and '2022-09-10 05:30:00'
group by ip, hourly, up_score
order by ip, hourly
But my expected query should produce a table output like below (and fill missing hours value with -1)
ip | hour_0 | hour_1 | hour_2 | ..
-------------------------- --------- ---------- ------
12.22.19.0 | 51 | 11 | 54 | ..
10.22.39.1 | 95 | 54 | 15 | ..
14.260.13.1 | 100 | 89 | 90 | ..
252.229.219.43 | 97 | 37 | 17 | ..
10.551.343.10 | 97 | 11 | 50 | ..
How can I do this?
Note:
The reasoning behind doing it this way is that getting the output by using the query I know will give back lot of rows and for every new ip
that gets added, the output rows by increase by another 24. Taking this result and processing it (I am using python
in my code) will take longer as I deal with more and more ips
.
But if I try it the way I mentioned (24 columns way), I will only deal with one additional row per ip
and the number of columns continues to be 24 only. Hence I am thinking of doing it this way. Please correct me if my approach is wrong or can be improved.
CodePudding user response:
You can use PIVOT
technique in next way:
select
date_trunc('day', date) as daily, ip,
coalesce(sum(up_score) filter (where extract('hour' from date) = 6 ), 0) up_score_6,
sum(up_score) filter (where extract('hour' from date) = 7 ) up_score_7,
sum(up_score) filter (where extract('hour' from date) = 8 ) up_score_8,
sum(up_score) filter (where extract('hour' from date) = 9 ) up_score_9
from net_score
where date between '2022-09-09 05:30:00' and '2022-09-10 05:30:00'
group by ip, daily
order by ip, daily
Of course you need expand my example query for each of 24 hours
===================== ================ ============ ============ ============ ============
| daily | ip | up_score_6 | up_score_7 | up_score_8 | up_score_9 |
===================== ================ ============ ============ ============ ============
| 2022-09-09 00:00:00 | 10.22.39.1 | 0 | 95 | 54 | 15 |
--------------------- ---------------- ------------ ------------ ------------ ------------
| 2022-09-09 00:00:00 | 10.551.343.10 | 0 | 97 | 11 | 50 |
--------------------- ---------------- ------------ ------------ ------------ ------------
| 2022-09-09 00:00:00 | 12.22.19.0 | 0 | 51 | 11 | 54 |
--------------------- ---------------- ------------ ------------ ------------ ------------
| 2022-09-09 00:00:00 | 14.260.13.1 | 0 | 100 | 89 | 90 |
--------------------- ---------------- ------------ ------------ ------------ ------------
| 2022-09-09 00:00:00 | 252.229.219.43 | 0 | 97 | 37 | 17 |
--------------------- ---------------- ------------ ------------ ------------ ------------
When used old PostgreSQL version then CASE statemnt can be used:
select
date_trunc('day', date) as daily, ip,
sum(case when extract('hour' from date) = 6 then up_score else 0 end) up_score_6,
sum(case when extract('hour' from date) = 7 then up_score else 0 end) up_score_7,
sum(case when extract('hour' from date) = 8 then up_score else 0 end) up_score_8,
sum(case when extract('hour' from date) = 9 then up_score else 0 end) up_score_9
from net_score
where date between '2022-09-09 05:30:00' and '2022-09-10 05:30:00'
group by ip, daily
order by ip, daily