Home > database >  Most efficient way to update table column based on sum
Most efficient way to update table column based on sum

Time:01-06

I am looking for the most efficient / minimal code way to update a table column based on the sum of another value in the same table. A method which works and the temp table are shown below.

if object_id('tempdb..#t1') is not null drop table #t1
CREATE TABLE #t1 (id nvarchar(max), astate varchar(16), code varchar(16), price decimal(16,2), total_id_price_bystate decimal(16,2), total_id_price decimal(16,2))

INSERT into #t1 VALUES 
   (100, 'CA', '0123', 123.01, null, null),
   (100, 'CA', '0124', 0.00, null, null),   
   (100, 'PA', '0256', 12.10, null, null),
   (200, 'MA', '0452', 145.00, null, null),
   (300, 'MA', '0578', 134.23, null, null),
   (400, 'CA', '1111', 94.12, null, null),
   (600, 'CA', '0000', 86.34, null, null),
   (500, 'CO', '1111', 0.00, null, null);

update t1
set total_id_price_bystate = sum_price_bystate
from #t1 t1
inner join (
    select  t2_in.Id,
            t2_in.astate,
            sum(t2_in.price) as sum_price_bystate
        from #t1 t2_in
        group by t2_in.id, t2_in.astate
           ) t2
    on t1.id = t2.id
    and t1.astate = t2.astate

update t1
set total_id_price = sum_price
from #t1 t1
inner join (
    select  t3_in.Id,
            sum(t3_in.price) as sum_price
        from #t1 t3_in
        group by t3_in.id
           ) t3
    on t1.id = t3.id

select * from #t1

The main thing I don't like about my method is that it requires an inner join with a subquery that requires the same table itself. So I am looking for a way that might be able to avoid this, although I don't think this method I have is overly complicated. Maybe there isn't any method too much more efficient. To add, I am wondering what the best way would be to combine the two updates together, since they are very similar, but only differ by the group by clause.

CodePudding user response:

As pointed out in the comments, this is not a good way to store data as it violates the basic principles of normalisation -

  • you are storing data that you can compute
  • you are storing the same data multiple times, ie, duplicates.
  • you need to re-calculate the totals whenever any individual values changes
  • it's possible to update a single row and create a data contradiction
  • it's also not a bad thing to pre-calculate aggregations, especially in a data warehouse scenario, but you would still only store the value once per unique key.

Normalisation prevents these issues.

Saying that, you can utilise analytic window functions to compute your values in a single pass over the table:

select *, 
    Sum(price) over(partition by id, astate) total_id_price_bystate,
    Sum(price) over(partition by id) total_id_price
from #t1;

If you really want the data in this format you could create a view and query it:

create view Totals as 
    select id, astate, code, price, total_id_price_bystate, total_id_price,
        Sum(price) over(partition by id, astate) total_bystate,
        Sum(price) over(partition by id) total
    from t1;

select *
from Totals where id = 100;

And to answer your specific question, a view (or a CTE) that touches a single base table can be updated so you can accomplish what you are doing like so:

drop view Totals;

create view Totals as 
    select id, astate, code, price, total_id_price_bystate, total_id_price,
        Sum(price) over(partition by id, astate) total_bystate,
        Sum(price) over(partition by id) total
    from t1;

update totals set
    total_id_price_bystate = total_bystate,
    total_id_price = total;

CodePudding user response:

You can use PARTITION BY to get the two different aggregated value,

if object_id('tempdb..#t1') is not null drop table #t1
CREATE TABLE #t1 (id nvarchar(max), astate varchar(16), code varchar(16), price decimal(16,2), total_id_price_bystate decimal(16,2), total_id_price decimal(16,2))

INSERT into #t1 VALUES 
   (100, 'CA', '0123', 123.01, null, null),
   (100, 'CA', '0124', 0.00, null, null),   
   (100, 'PA', '0256', 12.10, null, null),
   (200, 'MA', '0452', 145.00, null, null),
   (300, 'MA', '0578', 134.23, null, null),
   (400, 'CA', '1111', 94.12, null, null),
   (600, 'CA', '0000', 86.34, null, null),
   (500, 'CO', '1111', 0.00, null, null);

update t1
set total_id_price_bystate = sum_price_bystate,total_id_price=sum_price
from #t1 t1
inner join (
    select  t2_in.Id,
            t2_in.astate,
            sum(t2_in.price) over(partition by t2_in.id, t2_in.astate) as sum_price_bystate,
            sum(t2_in.price) over(partition by t2_in.id) as sum_price
        from #t1 t2_in
           ) t2
    on t1.id = t2.id
    and t1.astate = t2.astate


select * from #t1
  • Related