I have a table like below.
id | account_id | sha256 | size |
---|---|---|---|
1 | 1 | abc | 120 |
2 | 1 | abc | 120 |
3 | 1 | bcd | 150 |
4 | 2 | abc | 120 |
5 | 2 | def | 80 |
6 | 3 | fed | 100 |
7 | 3 | fed | 100 |
I need to find the sum of the size column but same sha256 for an account should be added only once. Rows to be added should as below.
id | account_id | sha256 | size |
---|---|---|---|
1 | 1 | abc | 120 |
3 | 1 | bcd | 150 |
4 | 2 | abc | 120 |
5 | 2 | def | 80 |
6 | 3 | fed | 100 |
Row number 2 and 7 is removed due to duplicate sha256 value per same account. Row 4 is not removed as it belongs to a different account even though it has the same sha256, and sum should be 570.
Tried below query, but giving a syntax error at or near "distinct".
SELECT SUM(f.size) FROM
(SELECT account_id, DISTINCT sha256, size FROM files GROUP BY account_id, sha256, size) f
CodePudding user response:
Use DISTINCT ON
:
SELECT SUM(size) AS total
FROM
(
SELECT DISTINCT ON (account_id, sha256, size) size
FROM FILES
ORDER BY account_id, sha256, size, id
) t;
The above logic retains, for each group of (account_id, sha256, size)
values, a single record corresponding to the lowest id
value. This set of records is then summed by size to get the total.
CodePudding user response:
You can sum over the distinct combinations of account_id
, sha256
and size
:
SELECT SUM(size) total_size
FROM (SELECT DISTINCT account_id, sha256, size FROM files) f;
See the demo.
CodePudding user response:
your data
CREATE TABLE test(
id INTEGER NOT NULL
,account_id INTEGER NOT NULL
,sha256 VARCHAR(40) NOT NULL
,size INTEGER NOT NULL
);
INSERT INTO test
(id,account_id,sha256,size) VALUES
(1,1,'abc',120),
(2,1,'abc',120),
(3,1,'bcd',150),
(4,2,'abc',120),
(5,2,'def',80),
(6,3,'fed',100),
(7,3,'fed',100);
use Row_number
to distinguishing duplicate values
SELECT SUM(f.size) AS total
FROM (SELECT id,
account_id,
sha256,
size,
Row_number ()
OVER (
partition BY account_id, sha256, size
ORDER BY id ASC ) rn
FROM test) f
WHERE rn = 1
CodePudding user response:
Boils down to just:
SELECT sum(size) AS total
FROM (SELECT DISTINCT ON (account_id, sha256) size FROM files) sub;
About DISTINCT ON
:
You didn't touch on the case where size
would differ for the same (account_id, sha256)
. I guess that's because that can never happen for some undisclosed reason. If it can happen, you need to define what to do exactly ...
CodePudding user response:
Here is another one..
WITH
files AS
(
Select 1 "ID", 1 "ACCOUNT_ID", 'abc' "SHA256", 120 "SZ" From Dual UNION ALL
Select 2 "ID", 1 "ACCOUNT_ID", 'abc' "SHA256", 120 "SZ" From Dual UNION ALL
Select 3 "ID", 1 "ACCOUNT_ID", 'bcd' "SHA256", 150 "SZ" From Dual UNION ALL
Select 4 "ID", 2 "ACCOUNT_ID", 'abc' "SHA256", 120 "SZ" From Dual UNION ALL
Select 5 "ID", 2 "ACCOUNT_ID", 'def' "SHA256", 80 "SZ" From Dual UNION ALL
Select 6 "ID", 3 "ACCOUNT_ID", 'fed' "SHA256", 100 "SZ" From Dual UNION ALL
Select 7 "ID", 3 "ACCOUNT_ID", 'fed' "SHA256", 100 "SZ" From Dual
)
SELECT DISTINCT
Min(f.ID) OVER(PARTITION BY f.ACCOUNT_ID, f.SHA256 ORDER BY f.ACCOUNT_ID, f.SHA256) "ID",
f.ACCOUNT_ID "ACCOUNT_ID",
f.SHA256 "SHA256",
f.SZ "SIZE"
FROM
files f
ORDER BY
f.ACCOUNT_ID,
f.SHA256
--
-- Result
-- ID ACCOUNT_ID SHA256 SIZE
-- 1 1 abc 120
-- 3 1 bcd 150
-- 4 2 abc 120
-- 5 2 def 80
-- 6 3 fed 100
Min(f.ID) OVER(.....) is the first ID of group (ACCOUNT_ID, SHA256), DISTINCT gives us just distinct rows and the rest are values that you asked for. If you sum the size you'll get 570... Regards...