Home > Blockchain >  Find the sum of values from distinct groups
Find the sum of values from distinct groups

Time:05-29

I have a table like below.

id account_id sha256 size
1 1 abc 120
2 1 abc 120
3 1 bcd 150
4 2 abc 120
5 2 def 80
6 3 fed 100
7 3 fed 100

I need to find the sum of the size column but same sha256 for an account should be added only once. Rows to be added should as below.

id account_id sha256 size
1 1 abc 120
3 1 bcd 150
4 2 abc 120
5 2 def 80
6 3 fed 100

Row number 2 and 7 is removed due to duplicate sha256 value per same account. Row 4 is not removed as it belongs to a different account even though it has the same sha256, and sum should be 570.

Tried below query, but giving a syntax error at or near "distinct".

SELECT SUM(f.size) FROM 
(SELECT account_id, DISTINCT sha256, size FROM files GROUP BY account_id, sha256, size) f

CodePudding user response:

Use DISTINCT ON:

SELECT SUM(size) AS total
FROM
(
    SELECT DISTINCT ON (account_id, sha256, size) size
    FROM FILES
    ORDER BY account_id, sha256, size, id
) t;

The above logic retains, for each group of (account_id, sha256, size) values, a single record corresponding to the lowest id value. This set of records is then summed by size to get the total.

CodePudding user response:

You can sum over the distinct combinations of account_id, sha256 and size:

SELECT SUM(size) total_size
FROM (SELECT DISTINCT account_id, sha256, size FROM files) f; 

See the demo.

CodePudding user response:

your data

CREATE TABLE test(
   id         INTEGER  NOT NULL 
  ,account_id INTEGER  NOT NULL
  ,sha256     VARCHAR(40) NOT NULL
  ,size       INTEGER  NOT NULL
);
INSERT INTO test
(id,account_id,sha256,size) VALUES 
(1,1,'abc',120),
(2,1,'abc',120),
(3,1,'bcd',150),
(4,2,'abc',120),
(5,2,'def',80),
(6,3,'fed',100),
(7,3,'fed',100);

use Row_number to distinguishing duplicate values

SELECT SUM(f.size) AS total
FROM   (SELECT id,
               account_id,
               sha256,
               size,
               Row_number ()
                 OVER (
                   partition BY account_id, sha256, size
                   ORDER BY id ASC ) rn
        FROM   test) f
WHERE  rn = 1  

dbfiddle

CodePudding user response:

Boils down to just:

SELECT sum(size) AS total
FROM  (SELECT DISTINCT ON (account_id, sha256) size FROM files) sub;

About DISTINCT ON:

You didn't touch on the case where size would differ for the same (account_id, sha256). I guess that's because that can never happen for some undisclosed reason. If it can happen, you need to define what to do exactly ...

CodePudding user response:

Here is another one..

WITH
    files AS
        (
            Select 1 "ID",  1 "ACCOUNT_ID", 'abc' "SHA256", 120 "SZ" From Dual UNION ALL
            Select 2 "ID",  1 "ACCOUNT_ID", 'abc' "SHA256", 120 "SZ" From Dual UNION ALL
            Select 3 "ID",  1 "ACCOUNT_ID", 'bcd' "SHA256", 150 "SZ" From Dual UNION ALL
            Select 4 "ID",  2 "ACCOUNT_ID", 'abc' "SHA256", 120 "SZ" From Dual UNION ALL
            Select 5 "ID",  2 "ACCOUNT_ID", 'def' "SHA256", 80 "SZ" From Dual UNION ALL
            Select 6 "ID",  3 "ACCOUNT_ID", 'fed' "SHA256", 100 "SZ" From Dual UNION ALL
            Select 7 "ID",  3 "ACCOUNT_ID", 'fed' "SHA256", 100 "SZ" From Dual 
        )
SELECT DISTINCT
    Min(f.ID) OVER(PARTITION BY f.ACCOUNT_ID, f.SHA256 ORDER BY f.ACCOUNT_ID, f.SHA256) "ID",
    f.ACCOUNT_ID "ACCOUNT_ID", 
    f.SHA256 "SHA256",
    f.SZ "SIZE"      
FROM 
    files f
ORDER BY 
    f.ACCOUNT_ID, 
    f.SHA256
--
--  Result
-- ID   ACCOUNT_ID  SHA256     SIZE
--  1           1   abc         120
--  3           1   bcd         150
--  4           2   abc         120
--  5           2   def          80
--  6           3   fed         100

Min(f.ID) OVER(.....) is the first ID of group (ACCOUNT_ID, SHA256), DISTINCT gives us just distinct rows and the rest are values that you asked for. If you sum the size you'll get 570... Regards...

  • Related