Home > Software engineering >  Sum based on column condition
Sum based on column condition

Time:12-09

I am trying to sum sales on my table grouped for next 30 days.

Here is the data I have.

date sales
2021-08-08 35
2021-08-08 14
2021-08-11 35
2021-09-09 22
2021-09-21 44
2021-10-16 46
2021-10-25 9
2021-10-25 1
2021-10-25 2
2021-10-25 6
2021-11-04 1
2021-11-07 1

I am expecting the result like this

date sales total 30d
2021-08-08 14 84
2021-08-08 35 84
2021-08-11 35 57
2021-09-09 22 66
2021-09-21 44 90
2021-10-16 46 76
2021-10-25 9 30
2021-10-25 6 30
2021-10-25 2 30
2021-10-25 1 30
2021-11-04 1 12
2021-11-07 1 29

Explanation: For date 2021-08-08 it's going to sum the sales from date >= 2021-08-08 til 30 days after which is 2021-09-07 so it would be 14 35 35 (2021-08-08, 2021-08-08 and 2021-08-11).

I got my work around using this query, but I think it is very inefficient and I believe there's other way using windows function though I can't wrap my head around it.

with temp as (
    select date, sum(sales) as sales_total
from test_table
group by 1,2
)
, temp2 as (
SELECT a.date, SUM(b.sales_total) total
FROM temp a, temp b
WHERE b.date >= a.date AND b.date <= a.date   interval '30' day
GROUP BY a.date
)
select a.date, a.sales, b.total 
from test_table a 
JOIN temp2 b on a.date = b.date

Any pointer to solve this problem using windows function or any other more efficient way is appreciated. Thanks!

CodePudding user response:

Here is a way to do this using recursive cte.

with data
   as (select start_dt,sales,row_number() over(order by start_dt) as rnk       
         from t1
       )
     ,cte_data
    as ( select d.start_dt,d.sales,d.rnk,d.start_dt as grp_dt
           from data d
          where rnk=1
        union all
        select d.start_dt,d.sales,d.rnk,case when datediff(day,d2.grp_dt,d.start_dt)>30 then 
                                           d.start_dt
                                     else d2.grp_dt
                                 end
          from cte_data d2
          join data d
            on d.rnk=d2.rnk 1
       )
  select min(start_dt) start_range,max(start_dt) end_range,sum(sales) as sum_sales
    from cte_data
 group by grp_dt

 ------------- ------------ ----------- 
| start_range | end_range  | sum_sales |
 ------------- ------------ ----------- 
| 2021-08-08  | 2021-08-11 |        84 |
| 2021-09-09  | 2021-09-21 |        66 |
| 2021-10-16  | 2021-11-07 |        66 |
 ------------- ------------ ----------- 

sql fiddle link https://dbfiddle.uk/2LmQe5n5

To elaborate a bit on this. We would need some kind of recursive process to do this, as the results of the following row depends on the previous entry date, and so I doubt we could do this using window functions.

CodePudding user response:

It seems that you have an overlapping groups problem, you could use a self-join and aggregate as the following:

SELECT T.date, T.sales,
       SUM(D.sales) AS total_30d 
FROM
test_table T JOIN test_table D
ON D.date BETWEEN T.date AND T.date   interval '30' day
GROUP BY T.date, T.sales
ORDER BY T.date, T.sales

See a demo.

CodePudding user response:

Straightforward using a scalar subquery:

select ext.*, 
(
  select sum(sales) from the_table
  where "date" between ext.date and ext.date   30
) as total_30d 
from the_table as ext;

DB-fiddle demo.

date sales sum
2021-08-08 35 84
2021-08-08 14 84
2021-08-11 35 57
2021-09-09 22 66
2021-09-21 44 90
2021-10-16 46 66
2021-10-25 9 20
2021-10-25 1 20
2021-10-25 2 20
2021-10-25 6 20
2021-11-04 1 2
2021-11-07 1 1
  • Related