Sample contents are:
id | created_dt | data |
---|---|---|
1 | 2023-01-14 11:52:41 | {"customers": 1, "payments: 2} |
2 | 2023-01-15 11:53:43 | {"customers": 1, "payments: 2} |
3 | 2023-01-18 11:51:45 | {"customers": 1, "payments: 2} |
4 | 2023-01-15 11:50:48 | {"customers": 1, "payments: 2} |
ID 4 or 2 should be distinct.
I want to get a result as follows:
year | week | customers | payments |
---|---|---|---|
2023 | 2 | 2 | 4 |
2023 | 3 | 1 | 2 |
I solved this problem in this way
SELECT
date_part('year', sq.created_dt) AS year,
date_part('week', sq.created_dt) AS week,
sum((sq.data->'customers')::int) AS customers,
sum((sq.data->'payments')::int) AS payments
FROM
(SELECT DISTINCT ON (created_dt::date) created_dt, data
FROM analytics) sq
GROUP BY
year, week
ORDER BY
year, week;
However, that subquery greatly complicates the query. Is there is a better method?
I need group the data by each week, however I also need to remove duplicate days.
CodePudding user response:
Generate series to create the join table would solve the problem :
SELECT sum((sq.data->'customers')::int) as customers,
sum((sq.data->'payments')::int) as payments,
date_part('year', dategroup ) as year,
date_part('week', dategroup ) as week,
FROM generate_series(current_date , current_date interval '1 month' , interval'1 week') AS dategroup
JOIN analytics AS a ON a.created_dt >= dategroup AND a.created_dt <= a.created_dt interval '1 week'
GROUP BY dategroup
ORDER BY dategroup
CodePudding user response:
First of all, I think your query is quite simple and understandable.
Here is the query with a with
-query in it, in some point it adds more readabilty:
WITH unique_days_data AS (
SELECT DISTINCT created_dt::date, data_json
FROM analytics)
SELECT
date_part('year', ud.created_dt) as year,
date_part('week', ud.created_dt) as week,
sum((ud.data_json->'customers')::int) as customers,
sum((ud.data_json->'payments')::int) as payments
FROM unique_days_data ud
GROUP BY year, week
ORDER BY year, week;
The difference is that the first query uses the DISTINCT
clause, not the DISTINCT ON
clause.
Here is the sql fiddle.
CodePudding user response:
You can simplify it by adding partitioning on "created_id::date", then filter last aggregated record for each week using FETCH FIRST n ROWS WITH TIES
.
SELECT date_part('year', created_dt) AS year,
date_part('week', created_dt) AS week,
SUM((data->>'customers')::int) AS customers,
SUM((data->>'payments')::int) AS payments
FROM analytics
GROUP BY year, week, created_dt::date
ORDER BY ROW_NUMBER() OVER(
PARTITION BY date_part('week', created_dt)
ORDER BY created_dt::date DESC
)
FETCH FIRST 1 ROWS WITH TIES
Check the demo here.