Home > Software design >  Agregate boolean table with SQL Server
Agregate boolean table with SQL Server

Time:07-28

I have this table. All values are 0 or 1.

a b c
1 0 0
1 1 0
0 1 0
1 1 1

and I want this one

a b c
a 3 2 1
b 2 3 1
c 1 1 1

This last table answers to the question how many rows have {raw} and {col} set to 1. For example, there are 2 rows where a = b = 1 in the first table, so cell(a,b) = 2.

I have a query that is not suitable for large tables. Is it possible to make it simpler?

SELECT
    'a' AS ' ',  
    SUM(a) AS a, 
    (SELECT SUM(b) FROM tab WHERE a = 1) AS b, 
    (SELECT SUM(c) FROM tab WHERE a = 1) AS c 
FROM 
    tab

UNION

SELECT
    'b', 
    (SELECT SUM(a) FROM tab WHERE b = 1),
    SUM(b), 
    (SELECT SUM(c) FROM tab WHERE b = 1) 
FROM
    tab

UNION

SELECT
    'c', 
    (SELECT SUM(a) FROM tab WHERE c = 1), 
    (SELECT SUM(b) FROM tab WHERE c = 1),
    SUM(c) 
FROM
    tab

CodePudding user response:

You can try the following using conditional aggregation instead of sub-queries, it's probably a little bit cleaner:

select 'a' [ ],
 Sum (case when a = a then a else 0 end) a,
 Sum (case when a = b then b else 0 end) b,
 Sum (case when a = c then c else 0 end) c
from t 
union all
select 'b' [ ],
 Sum (case when b = a then a else 0 end) a,
 Sum (case when b = b then b else 0 end) b,
 Sum (case when b = c then c else 0 end) c
from t 
union all
select 'c' [ ],
 Sum (case when c = a then a else 0 end) a,
 Sum (case when c = b then b else 0 end) b,
 Sum (case when c = c then c else 0 end) c
from t;

DB<>Fiddle

CodePudding user response:

I'm sure there must be a simpler solution but I can't see it. This is what I came up with:

with
d (s, d, x, y) as (
  select 'a', 'b', a, b from t
  union all select 'a', 'c', a, c from t
  union all select 'b', 'c', b, c from t
  union all select 'a', 'a', a, a from t
  union all select 'b', 'b', c, c from t
  union all select 'c', 'c', c, c from t
),
r (s, d, cnt) as (
  select s, d, count(*)
  from d
  where x = 1 and y = 1
  group by s, d
)
select s, max(a) as a, max(b) as b, max(c) as c
from (
  select s,
    sum(case when d = 'a' then cnt else 0 end) as a,
    sum(case when d = 'b' then cnt else 0 end) as b,
    sum(case when d = 'c' then cnt else 0 end) as c
  from r
  group by s
  union all 
  select d,
    sum(case when s = 'a' then cnt else 0 end) as a,
    sum(case when s = 'b' then cnt else 0 end) as b,
    sum(case when s = 'c' then cnt else 0 end) as c
  from r
  group by d
) x
group by s

Result:

 s  a  b  c 
 -- -- -- - 
 a  3  2  1 
 b  2  1  1 
 c  1  1  1 

See running example at db<>fiddle.

CodePudding user response:

You can aggregate only once in the table:

WITH 
  v AS (SELECT v FROM (VALUES ('a'), ('b'), ('c')) v(v)),
  s AS (
    SELECT SUM(a) a, SUM(b) b, SUM(c) c,
           SUM(a * b) ab, SUM(b * c) bc, SUM(c * a) ca
    FROM tab
  )  
SELECT v.v,
       CASE v.v WHEN 'a' THEN s.a WHEN 'b' THEN s.ab WHEN 'c' THEN s.ca END a,
       CASE v.v WHEN 'a' THEN s.ab WHEN 'b' THEN s.b WHEN 'c' THEN s.bc END b,
       CASE v.v WHEN 'a' THEN s.ca WHEN 'b' THEN s.bc WHEN 'c' THEN s.c END c
FROM v CROSS JOIN s
ORDER BY v.v;

See the demo.

CodePudding user response:

Here's an approach that is easy to extend with more columns. Just add them where there are arrows in comments:

with t1 as (select *, row_number() over (order by a) as rn from t),
u(col, rn, val) as (
    select 'a', rn, a from t1 union all
    select 'b', rn, b from t1 union all
    select 'c', rn, c from t1 union all
    select 'd', rn, d from t1                           /* <-- */
), data as (
    select d1.col as Rx, d2.col as Cx, d1.rn, 
        case when d1.val = 1 and d2.val = 1 then 1 else 0 end as v
    from u as d1 inner join u as d2 on d1.rn = d2.rn
)
select Rx as [ ], [a], [b], [c], [d]                    /* <-- */
from (select Rx, Cx, sum(v) v from data group by Rx, Cx) as src
pivot (min(v) for Cx in ([a], [b], [c], [d])) as pvt;   /* <-- */

https://dbfiddle.uk/?rdbms=sqlserver_2019&fiddle=134e2e39c4cfb55ff1af69c992942aef

CodePudding user response:

You can use GROUPING SETS for this

  Cols = CASE WHEN GROUPING(a) = 0 THEN 'a'
              WHEN GROUPING(b) = 0 THEN 'b'
              ELSE 'c' END,
  a = COUNT(NULLIF(a, 0)),
  b = COUNT(NULLIF(b, 0)),
  c = COUNT(NULLIF(c, 0))
FROM tab
GROUP BY GROUPING SETS (
  (a),
  (b),
  (c)
)
HAVING ISNULL(ISNULL(a, b), c) = 1
ORDER BY Cols;

db<>fiddle

This assumes that you have a bit column. Otherwise you should do SUM(a) etc.


  • GROUPING SETS outputs a separate GROUP BY for each set of grouping columns, a bit like a UNION ALL.
  • The GROUPING function tells you whether a column has been aggregated.
  • COUNT(NULLIF(someValue, 0)) only counts a value if it is not 0.
  • HAVING filters out cases when the grouping column is 0.
  • Related