Home > Enterprise >  Use GROUP BY values as columns with COUNT over time
Use GROUP BY values as columns with COUNT over time

Time:12-10

I'm trying to count the different types of emails (template) my system has sent per day. I can get these values with:

select to_char(created_at, 'YYYY-MM-DD') period,
    count(*) num,
    template
from email_log
group by period, template
order by period desc, template

My results look like

period     | num  | template
----------- ------ ---------
2021-12-09 | 185  | reminder
2021-12-09 | 80   | thankyou
2021-12-09 | 1224 | welcome
2021-12-08 | 327  | reminder
2021-12-08 | 108  | thankyou
2021-12-08 | 1698 | welcome

However because of the multiple group bys I'm getting more than one row per day. How can I extrapolate the template value into columns, with the count(*) as the cell value? Meaning my columns are the template names, my rows are the counts. For example, it would look like

period     | reminder | thankyou | welcome
----------- ---------- ---------- --------
2021-12-09 | 185      | 80       | 1224
2021-12-08 | 327      | 108      | 1698

Is this possible to do in plain SQL, and efficiently? Otherwise I need to use another program to take these values as a CSV and restructure the data.

Bonus bonus points if I can add a sum of all the columns at the end as a total


The closest I've been able to get is this, but it requires me to hardcode the template values and it's not very efficient

select to_char(created_at, 'YYYY-MM-DD') period,
    sum(case when template = 'reminder' then 1 else 0 end) as reminder,
    sum(case when template = 'thankyou' then 1 else 0 end) as thankyou,
    sum(case when template = 'welcome' then 1 else 0 end) as welcome,
    count(*) total
from email_log
group by period
order by period desc

CodePudding user response:

Do it client side, or with the query you already have.

There is the crosstab function in tablefunc, but it is pretty gross and I don't think it has any advantages over what you already have.

If you want a total row as well as a total column, you could change the group by period to group by rollup (period)

CodePudding user response:

A (not-straight-forward) solution for a query with the column names and number not known before the run time :

First step : create a composite type email_templates which corresponds to the list of the columns as expected :

CREATE OR REPLACE PROCEDURE email_templates ()
LANGUAGE plpgsql AS
$$
DECLARE
  composite_type_list text ;
BEGIN
  SELECT '(period date,' || string_agg(DISTINCT template || ' integer', ',') || ',total integer)'
    INTO composite_type_list 
    FROM email_log ;

  EXECUTE 'DROP TYPE IF EXISTS email_templates' ;
  EXECUTE 'CREATE TYPE email_templates AS ' || composite_type_list ;
END ;
$$ ;

CALL email_templates () ;

Second step : create the plpgsql function daily_synthesis() with a dynamic query structured according to the new composite type email_templates, and returning a set of json :

CREATE OR REPLACE FUNCTION daily_synthesis ()
RETURNS setof json LANGUAGE plpgsql AS
$$
DECLARE
  columns_list text ;
BEGIN
  SELECT string_agg(DISTINCT 'sum(case when template = ' || quote_nullable(template) || ' then 1 else 0 end) ', ',')
    INTO columns_list 
    FROM email_log ;

  RETURN QUERY EXECUTE 
  E'select row_to_json(row(to_char(created_at, \'YYYY-MM-DD\'), ' || columns_list || ', count(*)) :: email_templates)
     from email_log
    group by created_at
    order by created_at desc' ;
END ;
$$ ;

Third step : call the function daily_synthesis() and convert the json result into a record with the expected columns :

select b.* 
  from daily_synthesis () as a
 cross join lateral json_populate_record (null :: email_templates, a) as b

result in dbfiddle :

period      reminder    thankyou    welcome     total
2021-12-09  2           2           2           6
  • Related