I have got two tables, 'page_visits' and 'comments', which store new webpage visits and new comments, respectively.
PAGE_VISITS
id | page_id | created_at |
---|---|---|
1 | 1111 | 2021-12-02T04:55:26.779Z |
2 | 1442 | 2021-12-02T02:25:32.219Z |
3 | 1111 | 2021-12-02T04:55:26.214Z |
COMMENTS
id | page_id | ... | created_at |
---|---|---|---|
1 | 1024 | ... | 2021-12-02T04:55:26.779Z |
2 | 1111 | ... | 2021-12-02T02:25:32.219Z |
3 | 3849 | ... | 2021-12-02T04:55:26.214Z |
I want to aggregate the data from both the tables in the past 1 hour to use for analytics, such that it looks like the table below.
PAGE_DATA
page_id | visit_count | comment_count | created_at |
---|---|---|---|
1024 | 14 | 3 | 2021-12-02T04:55:26.779Z |
1111 | 11 | 8 | 2021-12-02T02:25:32.219Z |
3849 | 1 | 0 | 2021-12-02T04:55:26.214Z |
2412 | 0 | 1 | 2021-12-02T04:55:26.779Z |
SELECT page_visits.page_id
, COUNT(page_visits.id) AS visitCount
, COALESCE(cmts.cmt_cnt,0) AS commentCount
FROM page_visits
LEFT OUTER
JOIN ( SELECT page_id
, COUNT(*) AS cmt_cnt
FROM comments
WHERE created_at >= NOW() - INTERVAL '1 HOUR'
GROUP
BY page_id
) AS cmts
ON cmts.page_id = page_visits.page_id
WHERE page_visits.created_at >= NOW() - INTERVAL '1 HOUR'
GROUP
BY page_visits.page_id, cmts.cmt_cnt;
I have the above code as of now, however, it only prints the row when comment_count is null, but it does not do the same when visit_count is 0 and comment_count is > 0. My first question is, how do I get it to print even when visit_count results as 0. Because someone could have gone on to the page the hour before but only made a comment later on.
Secondly, I am trying to run this code every hour with the use of pg_cron and I know that I can run a function directly in a cron scheduler, however, I am unable to turn the above code into a working postgres function that inserts a new row into the 'page_data' table each time its called.
Could someone help me out with these 2 issues? Thank you.
CodePudding user response:
Consider full join on two aggregates
SELECT page_visits.page_id
, COALESCE(vsts.cmt_cnt, 0) AS visitCount
, COALESCE(cmts.cmt_cnt, 0) AS commentCount
FROM (
SELECT page_id
, COUNT(*) AS vst_cnt
FROM page_visits
WHERE created_at >= NOW() - INTERVAL '1 HOUR'
GROUP BY page_id
) AS vsts
FULL OUTER JOIN (
SELECT page_id
, COUNT(*) AS cmt_cnt
FROM comments
WHERE created_at >= NOW() - INTERVAL '1 HOUR'
GROUP BY page_id
) AS cmts
ON cmts.page_id = vsts.page_id
Alternatively, aggregate a UNION query of both tables:
SELECT page_id
, SUM(vst_n) AS vst_cnt
, SUM(cmt_n) AS cmt_cnt
FROM (
SELECT page_id, 1 AS vst_n, 0 AS cmt_n
FROM page_visits
WHERE created_at >= NOW() - INTERVAL '1 HOUR'
UNION ALL
SELECT page_id, 0 AS vst_n, 1 AS cmt_n
FROM comments
WHERE created_at >= NOW() - INTERVAL '1 HOUR'
) AS sub
GROUP BY page_id
Regarding last question, if I understand you, simply run an insert-select query from above query. Not quite sure how you aggregated created_at
, but add a MIN
or MAX
to above aggregations and include an additional column below:
INSERT INTO page_data (page_id, visit_count, comment_count)
SELECT ...above query...