I have a table that looks like below (only a sample provided)
date | memory | cpu
---------------------------- ----------- ----------------
2022-10-06 02:03:30.102194 | 37 | -1
2022-10-06 02:03:30.106863 | 37 | -1
2022-10-06 02:03:30.110286 | 37 | -1
2022-10-06 02:03:30.114442 | 37 | -1
2022-10-06 02:03:30.118419 | 37 | -1
2022-10-06 02:03:30.121983 | 31.519346 | 7.0, 7.0, 7.0
2022-10-06 02:03:30.125608 | 31.519346 | 7.0, 7.0, 7.0
2022-10-06 02:03:30.12989 | 31.519346 | 7.0, 7.0, 7.0
2022-10-06 02:03:30.134142 | 31.519346 | 7.0, 7.0, 7.0
2022-10-06 02:03:30.137253 | 31.519346 | 7.0, 7.0, 7.0
2022-10-06 02:03:30.143885 | 31.542543 | 24.0
2022-10-06 02:03:30.148455 | 31.542543 | 24.0
2022-10-06 02:03:30.151906 | 30.906942 | 11.0
2022-10-06 02:03:30.155327 | 35.317715 | 12.0, 6.0
2022-10-06 02:03:30.161415 | 35.317715 | 12.0, 6.0
2022-10-06 02:03:30.16701 | 35.317715 | 12.0, 6.0
2022-10-06 02:03:30.170698 | 35.317715 | 12.0, 6.0
2022-10-06 02:03:30.174565 | 17.155323 | 1.0
2022-10-06 02:03:30.177928 | 17.155323 | 1.0
2022-10-06 02:03:30.181603 | 17.155323 | 1.0
2022-10-06 02:03:30.185589 | 32.224056 | 14.0
2022-10-06 02:03:30.188934 | 32.224056 | 14.0
2022-10-06 02:03:30.193468 | 7.368925 | 1.0
2022-10-06 02:03:30.198035 | 7.368925 | 1.0
2022-10-06 02:03:30.201166 | 31.726305 | 27.0
2022-10-06 02:03:30.205302 | -1 | 4.0, 3.0, 3.0
2022-10-06 02:03:30.208874 | -1 | 4.0, 3.0, 3.0
2022-10-06 02:03:30.212153 | -1 | 4.0, 3.0, 3.0
2022-10-06 02:03:30.215573 | 18.09042 | 1.0
2022-10-06 02:03:30.219527 | 41.8536 | 11.0, 1.0, 1.0
2022-10-06 02:03:30.222841 | 41.8536 | 11.0, 1.0, 1.0
2022-10-06 02:03:30.226778 | 27.492594 | 23.0
2022-10-06 02:03:30.230332 | 8.857008 | 2.0
2022-10-06 02:03:30.234988 | 8.857008 | 2.0
2022-10-06 02:03:30.238463 | 8.857008 | 2.0
2022-10-06 02:03:30.242749 | 20.523735 | 4.0, 2.0
2022-10-06 02:03:30.246553 | 20.523735 | 4.0, 2.0
2022-10-06 02:03:30.250473 | 20.523735 | 4.0, 2.0
2022-10-06 02:03:30.254418 | 20.523735 | 4.0, 2.0
2022-10-06 02:03:30.25883 | 20.523735 | 4.0, 2.0
2022-10-06 02:03:30.26222 | 29.214113 | 13.0
2022-10-06 02:03:30.265766 | 29.214113 | 13.0
As you can see this table stores cpu
and memory
readings along with date
. The cpu
columns contains comma separated values. So I split it out into individual columns like so
SQL
SELECT
date,
memory,
split_part(cpu, ',', 1) AS cpu1,
split_part(cpu, ',', 2) AS cpu2,
split_part(cpu, ',', 3) AS cpu3,
split_part(cpu, ',', 4) AS cpu4
FROM
webhook_devicedata
where
date between '2022-10-06 02:03:00'
and '2022-10-06 02:04:00'
order by
date;
SQL Output
date | memory | cpu1 | cpu2 | cpu3 | cpu4
---------------------------- ----------- ------ ------ ------ ------
2022-10-06 02:03:30.102194 | 37 | -1 | | |
2022-10-06 02:03:30.106863 | 37 | -1 | | |
2022-10-06 02:03:30.110286 | 37 | -1 | | |
2022-10-06 02:03:30.114442 | 37 | -1 | | |
2022-10-06 02:03:30.118419 | 37 | -1 | | |
2022-10-06 02:03:30.121983 | 31.519346 | 7.0 | 7.0 | 7.0 |
2022-10-06 02:03:30.125608 | 31.519346 | 7.0 | 7.0 | 7.0 |
2022-10-06 02:03:30.12989 | 31.519346 | 7.0 | 7.0 | 7.0 |
2022-10-06 02:03:30.134142 | 31.519346 | 7.0 | 7.0 | 7.0 |
2022-10-06 02:03:30.137253 | 31.519346 | 7.0 | 7.0 | 7.0 |
2022-10-06 02:03:30.143885 | 31.542543 | 24.0 | | |
2022-10-06 02:03:30.148455 | 31.542543 | 24.0 | | |
2022-10-06 02:03:30.151906 | 30.906942 | 11.0 | | |
2022-10-06 02:03:30.155327 | 35.317715 | 12.0 | 6.0 | |
2022-10-06 02:03:30.161415 | 35.317715 | 12.0 | 6.0 | |
2022-10-06 02:03:30.16701 | 35.317715 | 12.0 | 6.0 | |
2022-10-06 02:03:30.170698 | 35.317715 | 12.0 | 6.0 | |
2022-10-06 02:03:30.174565 | 17.155323 | 1.0 | | |
2022-10-06 02:03:30.177928 | 17.155323 | 1.0 | | |
2022-10-06 02:03:30.181603 | 17.155323 | 1.0 | | |
2022-10-06 02:03:30.185589 | 32.224056 | 14.0 | | |
2022-10-06 02:03:30.188934 | 32.224056 | 14.0 | | |
2022-10-06 02:03:30.193468 | 7.368925 | 1.0 | | |
2022-10-06 02:03:30.198035 | 7.368925 | 1.0 | | |
2022-10-06 02:03:30.201166 | 31.726305 | 27.0 | | |
2022-10-06 02:03:30.205302 | -1 | 4.0 | 3.0 | 3.0 |
2022-10-06 02:03:30.208874 | -1 | 4.0 | 3.0 | 3.0 |
2022-10-06 02:03:30.212153 | -1 | 4.0 | 3.0 | 3.0 |
2022-10-06 02:03:30.215573 | 18.09042 | 1.0 | | |
2022-10-06 02:03:30.219527 | 41.8536 | 11.0 | 1.0 | 1.0 |
2022-10-06 02:03:30.222841 | 41.8536 | 11.0 | 1.0 | 1.0 |
2022-10-06 02:03:30.226778 | 27.492594 | 23.0 | | |
2022-10-06 02:03:30.230332 | 8.857008 | 2.0 | | |
2022-10-06 02:03:30.234988 | 8.857008 | 2.0 | | |
2022-10-06 02:03:30.238463 | 8.857008 | 2.0 | | |
2022-10-06 02:03:30.242749 | 20.523735 | 4.0 | 2.0 | |
2022-10-06 02:03:30.246553 | 20.523735 | 4.0 | 2.0 | |
2022-10-06 02:03:30.250473 | 20.523735 | 4.0 | 2.0 | |
2022-10-06 02:03:30.254418 | 20.523735 | 4.0 | 2.0 | |
2022-10-06 02:03:30.25883 | 20.523735 | 4.0 | 2.0 | |
2022-10-06 02:03:30.26222 | 29.214113 | 13.0 | | |
2022-10-06 02:03:30.265766 | 29.214113 | 13.0 | | |
Now I want to group by date
and get average of memory
and individual cpu
columns on an hourly basis. If you look carefully, the column cpu1
has all the values. So running the query below works
SQL
SELECT
date_trunc('hour', date) as hourly,
avg(COALESCE(split_part(cpu, ',', 1) :: text) :: float) AS cpu1
FROM
webhook_devicedata
where
date between '2022-10-06 02:03:00'
and '2022-10-06 02:04:00'
group by
hourly;
I get the result
hourly | cpu1
--------------------- -------------------
2022-10-06 02:00:00 | 7.309523809523809
But as soon as I start involving other cpu
columns, I get an error
SQL
SELECT
date_trunc('hour', date) as hourly,
avg(COALESCE(split_part(cpu, ',', 1) :: text) :: float) AS cpu1,
avg(COALESCE(split_part(cpu, ',', 2) :: text) :: float) AS cpu2
FROM
webhook_devicedata
where
date between '2022-10-06 02:03:00'
and '2022-10-06 02:04:00'
group by
hourly;
Error
ERROR: invalid input syntax for type double precision: ""
CONTEXT: parallel worker
I even tried to get rid of null
and empty strings by writing the query like below
SQL
SELECT
date_trunc('hour', sub2.date) as hourly,
avg(sub2.cpu1) as avg_cpu1,
avg(sub2.cpu2) as avg_cpu2
from
(
select
date,
memory,
split_part(cpu, ',', 1) :: text as cpu1,
split_part(cpu, ',', 2) :: text as cpu2
FROM
(
select
date,
memory,
cpu
from
webhook_devicedata
where
date between '2022-10-06 02:03:00'
and '2022-10-06 02:04:00'
and cpu != ''
and cpu is not null
) as sub
) as sub2
group by
hourly;
But I get the below error
Error
ERROR: function avg(text) does not exist
LINE 1: SELECT date_trunc('hour', sub2.date) as hourly, avg(sub2.cpu...
^
HINT: No function matches the given name and argument types. You might need to add explicit type casts.
What am I doing wrong? How can I get the average of all the columns?
CodePudding user response:
You must check values using case, example:
SELECT
date_trunc('hour', date) as hourly,
avg(case when split_part(cpu, ',', 1) = '' then 0 else split_part(cpu, ',', 1):: float end) AS cpu1,
avg(case when split_part(cpu, ',', 2) = '' then 0 else split_part(cpu, ',', 2):: float end) AS cpu2
FROM
webhook_devicedata
where
date between '2022-10-06 02:03:00'
and '2022-10-06 02:04:00'
group by
hourly;