I need to nest the two queries. Query B output is N relevant to machine_id and it needs to go into the function LAG(utilisation, N) of query A. In the below query A 89 should be the result of query b - 1
Query A
SELECT
time AS "time",
machine_id,
utilisation - LAG(utilisation,89) OVER ( PARTITION BY machine_id ORDER BY time) as DataUsed
FROM hardware
WHERE
$__unixEpochFilter(time) AND
component = 'network'
ORDER BY time
Query B
SELECT
machine_id,
count(utilisation) AS N
FROM hardware
WHERE
$__unixEpochFilter(time) AND
component = "network"
GROUP BY machine_id
I am using this on Grafana so the time parameter is the range
here is an export of the table hardware where components are filtered to "network"
Panel Title-data-2021-11-27 13_12_14.csv
I have tried this but is not working
SELECT
A.time AS "time",
A.machine_id,
A.utilisation - LAG(A.utilisation,B.N - 1) OVER ( PARTITION BY A.machine_id ORDER BY A.time) as DataUsed
FROM hardware as A
INNER JOIN
(
SELECT
machine_id,
count(utilisation) AS N
FROM hardware
WHERE
$__unixEpochFilter(time) AND
component = "network"
GROUP BY machine_id
) as B on A.machine_id = B.machine_id
WHERE
$__unixEpochFilter(A.time) AND
component = 'network'
ORDER BY A.time
I have generated SQL query and the subquery works. I just don't know how to get B.N into LAG and replace 89
SELECT
A.time AS "time",
A.machine_id,
B.N,
A.utilisation - LAG(A.utilisation,89) OVER ( PARTITION BY A.machine_id ORDER BY A.time) as DataUsed
FROM hardware as A
INNER JOIN
(
SELECT
machine_id,
count(utilisation) AS N
FROM hardware
WHERE
time >= 1638058073 AND time <= 1638061673 AND
component = "network"
GROUP BY machine_id
) as B on A.machine_id = B.machine_id
WHERE
A.time >= 1638058073 AND A.time <= 1638061673 AND
component = 'network'
ORDER BY A.time
CodePudding user response:
Here are a few examples that calculate the data usage by subtracting it with the first (or lowest) data usage.
Note that I replaced the "time" column with "tyd" in my examples as this was giving me problems on my test setup.
I also commented out the $__unixEpochFilter(time)
lines as they are not applicable to my test setup. Uncomment it on your side.
Using Lag
The following will give you a single total data usage result for each machine_id. It basically uses a first WITH statement ("B") to determine the row count N, then a second WITH statement ("resultt") to calculate the data used using LAG and the row count N. Since the LAG calculation produces null for all entries except the last, the rows are filtered to be non-null right at the bottom.
-- Calculate single total data usage by counting rows and then using lag to
-- subtract the utilisation from the first utilisation row.
WITH B AS (
SELECT
machine_id,
count(utilisation) AS N
FROM hardware
WHERE
-- $__unixEpochFilter(time) AND
component = "network"
GROUP BY machine_id
),
resultt as (
SELECT
A.machine_id,
A.utilisation - (LAG(A.utilisation,(B.N - 1)) OVER ( PARTITION BY A.machine_id ORDER BY A.tyd)) as DataUsed
FROM hardware A
JOIN B ON B.machine_id = A.machine_id
WHERE
A.component = 'network'
)
SELECT
machine_id,
DataUsed
FROM resultt
WHERE DataUsed IS NOT NULL -- Since the lag calculation produces null for all rows but the last
Subtracting utilisation at first time
The following produces the same results as above, but instead of finding the first utilisation by means of LAG, it finds the utilisation value at the first time.
-- Calculate single total data used as the difference between the last
-- utilisation and the utilisation at first time.
-- This uses the first time to determine the first utilisation.
-- If utilisation is always increasing, calculation of the first time
-- is unnecessary, see the other script.
WITH first_time AS (
-- Get the first time for each machine_id
SELECT
machine_id,
min(tyd) AS min_tyd
FROM hardware
WHERE
-- $__unixEpochFilter(time) AND
component = 'network'
GROUP BY machine_id
), first_util AS (
-- Get the utilisation corresponding to the first time
SELECT
h.machine_id,
h.utilisation
FROM hardware h
JOIN first_time f ON f.machine_id = h.machine_id AND f.min_tyd = h.tyd
)
-- For each row the running utilisation can now be calculated as the difference
-- of the row's utilisation - first utilisation.
SELECT
h.tyd,
h.machine_id,
h.utilisation,
fu.utilisation AS first_util, -- Only for troubleshooting
max(h.utilisation - fu.utilisation) AS DataUsed
FROM hardware h
JOIN first_util fu ON fu.machine_id = h.machine_id
GROUP BY machine_id
ORDER BY h.machine_id, h.tyd, h.utilisation
Subtracting minimum utilisation
If the utilisation value is always increasing, the above script can be simplified by removing the time calculation and simply subtracting the first utilisation from the last one.
-- Calculate the single total data used as the difference between the last
-- and the first (lowest) utilisation for each machine_id.
-- This assumes that utilisation always increases as time passes.
WITH first_util AS (
-- Get the lowest utilisation for each machine_id
SELECT
machine_id,
min(utilisation) AS min_util
FROM hardware
WHERE
-- $__unixEpochFilter(time) AND
component = 'network'
GROUP BY machine_id
)
-- For each row the running utilisation can now be calculated as the difference
-- of the row's utilisation - first utilisation.
SELECT
h.tyd,
h.machine_id,
h.utilisation,
fu.min_util, -- Only for troubleshooting
max(h.utilisation - fu.min_util) AS DataUsed
FROM hardware h
JOIN first_util fu ON fu.machine_id = h.machine_id
GROUP BY machine_id
ORDER BY h.machine_id, h.tyd, h.utilisation
Bonus - running total
The above two scripts collapse the results into a single value to obtain the total data usage. To get a running total of the data usage, the max() and ORDER BY can simply be removed:
-- Calculate running total of data used minus the lowest utilisation.
-- This assumes that utilisation always increases as time passes.
WITH first_util AS (
-- Get the first (lowest) utilisation for each machine_id
SELECT
machine_id,
min(utilisation) as min_util
FROM hardware
WHERE
-- $__unixEpochFilter(time) AND
component = 'network'
GROUP BY machine_id
)
-- For each row the running utilisation can now be calculated as the difference
-- of the row's utilisation - first utilisation.
SELECT
h.tyd,
h.machine_id,
h.utilisation,
fu.min_util, -- Only for troubleshooting
h.utilisation - fu.min_util AS DataUsed
FROM hardware h
JOIN first_util fu ON fu.machine_id = h.machine_id
ORDER BY h.machine_id, h.tyd, h.utilisation
PS. Thanks to my wife for helping me out with the above SQL.