How to group by week and distinct by day in postgresql-CodePudding

Sample contents are:

id	created_dt	data
1	2023-01-14 11:52:41	{"customers": 1, "payments: 2}
2	2023-01-15 11:53:43	{"customers": 1, "payments: 2}
3	2023-01-18 11:51:45	{"customers": 1, "payments: 2}
4	2023-01-15 11:50:48	{"customers": 1, "payments: 2}

ID 4 or 2 should be distinct.

I want to get a result as follows:

year	week	customers	payments
2023	2	2	4
2023	3	1	2

I solved this problem in this way

SELECT
    date_part('year', sq.created_dt) AS year,
    date_part('week', sq.created_dt) AS week,
    sum((sq.data->'customers')::int) AS customers,
    sum((sq.data->'payments')::int) AS payments
FROM 
    (SELECT DISTINCT ON (created_dt::date) created_dt, data 
     FROM analytics) sq
GROUP BY 
    year, week
ORDER BY 
    year, week;

However, that subquery greatly complicates the query. Is there is a better method?

I need group the data by each week, however I also need to remove duplicate days.

CodePudding user response：

Generate series to create the join table would solve the problem :

SELECT sum((sq.data->'customers')::int) as customers,
sum((sq.data->'payments')::int) as payments,
date_part('year', dategroup ) as year,
date_part('week', dategroup ) as week,
FROM generate_series(current_date , current_date interval '1 month' , interval'1 week') AS dategroup
JOIN analytics AS a ON a.created_dt >= dategroup AND a.created_dt <= a.created_dt interval '1 week'
GROUP BY dategroup
ORDER BY dategroup

CodePudding user response：

First of all, I think your query is quite simple and understandable.

Here is the query with a with-query in it, in some point it adds more readabilty:

WITH unique_days_data AS (
  SELECT DISTINCT created_dt::date, data_json
  FROM analytics)
SELECT 
    date_part('year', ud.created_dt) as year,
    date_part('week', ud.created_dt) as week,
    sum((ud.data_json->'customers')::int) as customers,
    sum((ud.data_json->'payments')::int) as payments
FROM unique_days_data ud
GROUP BY year, week
ORDER BY year, week;

The difference is that the first query uses the DISTINCT clause, not the DISTINCT ON clause.

Here is the sql fiddle.

CodePudding user response：

You can simplify it by adding partitioning on "created_id::date", then filter last aggregated record for each week using FETCH FIRST n ROWS WITH TIES.

SELECT date_part('year', created_dt) AS year,
       date_part('week', created_dt) AS week,
       SUM((data->>'customers')::int) AS customers,
       SUM((data->>'payments')::int) AS payments
FROM analytics
GROUP BY year, week, created_dt::date
ORDER BY ROW_NUMBER() OVER(
             PARTITION BY date_part('week', created_dt) 
             ORDER     BY created_dt::date DESC
         )
FETCH FIRST 1 ROWS WITH TIES

Check the demo here.