Home > other >  How to sum records in partition for only part of records
How to sum records in partition for only part of records

Time:12-03

I am struggling with window functions in SQL Server. I have a table that is tracking how many records were transferred. I wrote a query to sum how many rows are transferred for each parameter. However, at some point records had to be resend (they were dropped from the final location and resend). So, if I continue with my old query, I get duplicated values.

This is an example table:

parameter rows min_id max_id create_date status
A1 48 350 521 06.11.2022 sent
A1 48 350 521 06.11.2022 error
A1 78 1 350 05.11.2022 sent
A1 13 299 350 04.11.2022 sent
A1 50 100 299 03.11.2022 sent
A1 15 1 100 01.11.2022 sent
B2 87 800 1202 07.11.2022 sent
B2 187 1 800 06.11.2022 sent
B2 12 570 800 04.11.2022 sent
B2 120 320 570 03.11.2022 sent
B2 55 1 320 01.11.2022 sent

You can understand when the table was resend when min_id is 1 again.

The result I want to achieve is:

parameter sum min_id max_id max_date
A1 126 1 521 06.11.2022
B2 274 1 1202 07.11.2022

What I was able to do so far (but is causing duplicate results):

SELECT * FROM
   (SELECT 
      parameter
      , sum(rows) over (partition by parameter) as sum
      , min_id
      , max_id
      , MAX(create_date) over (partition by parameter) as max_date
   FROM my_table) as s
WHERE create_date = max_date and status = 'sent'

I think that maybe one more window function (nested window function?) needs to be added that will make a certain range of partitions starting with min_id=1 having the latest create_date. However, I failed to do so. Could anyone advise on how to approach this?

CodePudding user response:

Instead of using partitions you can try using the GROUP BY clause since the output which you are expecting might have aggregations in all columns. You can try the below query:

SELECT
   parameter, 
   SUM(rows) AS rows, 
   MIN(min_id) AS min_id,
   MAX(max_id) AS max_id,
   MAX(create_date) AS max_date 
FROM my_table
WHERE status = 'sent'
GROUP BY parameter

This will give you the expected output. But still if you'd like to go with your old query and the duplicates are the only issue, you can try using DISTINCT keyword after SELECT to give you the unique records.

CodePudding user response:

With a small adjustment you could fetch the results as below:

SELECT parameter, sum(rows) as sum, min(min_id) as min_id, max(max_id) as max_id,
  max(create_date) as max_date
FROM
   (SELECT 
      parameter
      , rows
      , min_id
      , max_id
      , create_date
      , status
      , MAX(case when min_id = 1 then create_date end) over (partition by parameter) as sent_start
   FROM my_table) as s
WHERE create_date >= sent_start and status = 'sent'
GROUP BY parameter

It's worth considering the variations in data. Could records be resent with a min_id greater than 1? Can records be sent and resent within the same day?

If any of these are a possibility you may want to test using an EXISTS condition:

OPTION 2

;WITH SentRows as
(
SELECT *
FROM my_table
WHERE status='sent'
)

SELECT parameter, sum(rows) as sum, min(min_id) as min_id, max(max_id) as max_id,
  max(create_date) as max_date 
FROM SentRows as s
WHERE NOT EXISTS
  (SELECT 1 FROM SentRows t WHERE t.parameter = s.parameter AND t.create_date > s.create_date 
    AND t.min_id <= s.min_id AND t.max_id >= s.max_id)
GROUP BY parameter

For partially overlapping records you may want to involve window functions, but here it isn't required.

  • Related