I need to nest the two queries. Query B output is N relevant to machine_id and it needs to go into the function LAG(utilisation, N) of query A. In the below query A 89 should be the result of query b - 1

Query A

SELECT
  time AS "time",
  machine_id,
  utilisation - LAG(utilisation,89) OVER ( PARTITION BY machine_id ORDER BY time) as DataUsed
FROM hardware
WHERE
  $__unixEpochFilter(time) AND
  component = 'network'
ORDER BY time

Query B

SELECT
  machine_id,
  count(utilisation) AS N
FROM hardware
WHERE
  $__unixEpochFilter(time) AND
  component = "network"
GROUP BY machine_id

I am using this on Grafana so the time parameter is the range

here is an export of the table hardware where components are filtered to "network"

Panel Title-data-2021-11-27 13_12_14.csv

I have tried this but is not working

 SELECT
      A.time AS "time",
      A.machine_id,
      A.utilisation - LAG(A.utilisation,B.N - 1) OVER ( PARTITION BY A.machine_id ORDER BY A.time) as DataUsed
    FROM hardware as A
      INNER JOIN
        (
        SELECT
            machine_id,
            count(utilisation) AS N
        FROM hardware
        WHERE
            $__unixEpochFilter(time) AND
            component = "network"
        GROUP BY machine_id 
        ) as B on A.machine_id = B.machine_id
    WHERE
      $__unixEpochFilter(A.time) AND
      component = 'network'
    ORDER BY A.time

I have generated SQL query and the subquery works. I just don't know how to get B.N into LAG and replace 89

SELECT
  A.time AS "time",
  A.machine_id,
  B.N,
  A.utilisation - LAG(A.utilisation,89) OVER ( PARTITION BY A.machine_id ORDER BY A.time) as DataUsed
FROM hardware as A
  INNER JOIN
    (
    SELECT
        machine_id,
        count(utilisation) AS N
    FROM hardware
    WHERE
        time >= 1638058073 AND time <= 1638061673 AND
        component = "network"
    GROUP BY machine_id 
    ) as B on A.machine_id = B.machine_id
WHERE
  A.time >= 1638058073 AND A.time <= 1638061673 AND
  component = 'network'
ORDER BY A.time

CodePudding user response：

Here are a few examples that calculate the data usage by subtracting it with the first (or lowest) data usage. Note that I replaced the "time" column with "tyd" in my examples as this was giving me problems on my test setup. I also commented out the $__unixEpochFilter(time) lines as they are not applicable to my test setup. Uncomment it on your side.

Using Lag

The following will give you a single total data usage result for each machine_id. It basically uses a first WITH statement ("B") to determine the row count N, then a second WITH statement ("resultt") to calculate the data used using LAG and the row count N. Since the LAG calculation produces null for all entries except the last, the rows are filtered to be non-null right at the bottom.

-- Calculate single total data usage by counting rows and then using lag to
-- subtract the utilisation from the first utilisation row.

WITH B AS (
    SELECT
        machine_id,
        count(utilisation) AS N
    FROM hardware
    WHERE
        -- $__unixEpochFilter(time) AND
        component = "network"
    GROUP BY machine_id
),
resultt as (
    SELECT
        A.machine_id,
        A.utilisation - (LAG(A.utilisation,(B.N - 1)) OVER ( PARTITION BY A.machine_id ORDER BY A.tyd)) as DataUsed
    FROM hardware A
    JOIN B ON B.machine_id = A.machine_id 
    WHERE
        A.component = 'network'
)
SELECT 
    machine_id,
    DataUsed
FROM resultt
WHERE DataUsed IS NOT NULL -- Since the lag calculation produces null for all rows but the last

Subtracting utilisation at first time

The following produces the same results as above, but instead of finding the first utilisation by means of LAG, it finds the utilisation value at the first time.

-- Calculate single total data used as the difference between the last
-- utilisation and the utilisation at first time.
-- This uses the first time to determine the first utilisation.
-- If utilisation is always increasing, calculation of the first time
-- is unnecessary, see the other script.

WITH first_time AS (
    -- Get the first time for each machine_id
    SELECT
        machine_id,
        min(tyd) AS min_tyd
    FROM hardware
    WHERE
        -- $__unixEpochFilter(time) AND
        component = 'network'
    GROUP BY machine_id 
), first_util AS (
    -- Get the utilisation corresponding to the first time
    SELECT
        h.machine_id,
        h.utilisation
    FROM hardware h
    JOIN first_time f ON f.machine_id = h.machine_id AND f.min_tyd = h.tyd
)
-- For each row the running utilisation can now be calculated as the difference
-- of the row's utilisation - first utilisation.
SELECT 
    h.tyd, 
    h.machine_id,
    h.utilisation,
    fu.utilisation AS first_util, -- Only for troubleshooting
    max(h.utilisation - fu.utilisation) AS DataUsed
FROM hardware h
JOIN first_util fu ON fu.machine_id = h.machine_id
GROUP BY machine_id
ORDER BY h.machine_id, h.tyd, h.utilisation

Subtracting minimum utilisation

If the utilisation value is always increasing, the above script can be simplified by removing the time calculation and simply subtracting the first utilisation from the last one.

-- Calculate the single total data used as the difference between the last
-- and the first (lowest) utilisation for each machine_id.
-- This assumes that utilisation always increases as time passes.

WITH first_util AS (
    -- Get the lowest utilisation for each machine_id
    SELECT
        machine_id,
        min(utilisation) AS min_util
    FROM hardware
    WHERE
        -- $__unixEpochFilter(time) AND
        component = 'network'
    GROUP BY machine_id 
)
-- For each row the running utilisation can now be calculated as the difference
-- of the row's utilisation - first utilisation.
SELECT 
    h.tyd, 
    h.machine_id,
    h.utilisation,
    fu.min_util, -- Only for troubleshooting
    max(h.utilisation - fu.min_util) AS DataUsed
FROM hardware h
JOIN first_util fu ON fu.machine_id = h.machine_id
GROUP BY machine_id
ORDER BY h.machine_id, h.tyd, h.utilisation

Bonus - running total

The above two scripts collapse the results into a single value to obtain the total data usage. To get a running total of the data usage, the max() and ORDER BY can simply be removed:

-- Calculate running total of data used minus the lowest utilisation.
-- This assumes that utilisation always increases as time passes.

WITH first_util AS (
    -- Get the first (lowest) utilisation for each machine_id
    SELECT
        machine_id,
        min(utilisation) as min_util
    FROM hardware
    WHERE
        -- $__unixEpochFilter(time) AND
        component = 'network'
    GROUP BY machine_id 
)
-- For each row the running utilisation can now be calculated as the difference
-- of the row's utilisation - first utilisation.
SELECT 
    h.tyd, 
    h.machine_id,
    h.utilisation,
    fu.min_util, -- Only for troubleshooting
    h.utilisation - fu.min_util AS DataUsed
FROM hardware h
JOIN first_util fu ON fu.machine_id = h.machine_id
ORDER BY h.machine_id, h.tyd, h.utilisation

PS. Thanks to my wife for helping me out with the above SQL.