Home > database >  Postgres insert function by aggregating data from multiple tables
Postgres insert function by aggregating data from multiple tables

Time:12-06

I have got two tables, 'page_visits' and 'comments', which store new webpage visits and new comments, respectively.

PAGE_VISITS

id page_id created_at
1 1111 2021-12-02T04:55:26.779Z
2 1442 2021-12-02T02:25:32.219Z
3 1111 2021-12-02T04:55:26.214Z

COMMENTS

id page_id ... created_at
1 1024 ... 2021-12-02T04:55:26.779Z
2 1111 ... 2021-12-02T02:25:32.219Z
3 3849 ... 2021-12-02T04:55:26.214Z

I want to aggregate the data from both the tables in the past 1 hour to use for analytics, such that it looks like the table below.

PAGE_DATA

page_id visit_count comment_count created_at
1024 14 3 2021-12-02T04:55:26.779Z
1111 11 8 2021-12-02T02:25:32.219Z
3849 1 0 2021-12-02T04:55:26.214Z
2412 0 1 2021-12-02T04:55:26.779Z
SELECT page_visits.page_id
     , COUNT(page_visits.id)    AS visitCount
     , COALESCE(cmts.cmt_cnt,0) AS commentCount
  FROM page_visits
LEFT OUTER
  JOIN ( SELECT page_id
              , COUNT(*) AS cmt_cnt
           FROM comments
          WHERE created_at >= NOW() - INTERVAL '1 HOUR'
         GROUP
             BY page_id
       ) AS cmts
    ON cmts.page_id = page_visits.page_id
 WHERE page_visits.created_at >= NOW() - INTERVAL '1 HOUR'
GROUP
    BY page_visits.page_id, cmts.cmt_cnt;

I have the above code as of now, however, it only prints the row when comment_count is null, but it does not do the same when visit_count is 0 and comment_count is > 0. My first question is, how do I get it to print even when visit_count results as 0. Because someone could have gone on to the page the hour before but only made a comment later on.

Secondly, I am trying to run this code every hour with the use of pg_cron and I know that I can run a function directly in a cron scheduler, however, I am unable to turn the above code into a working postgres function that inserts a new row into the 'page_data' table each time its called.

Could someone help me out with these 2 issues? Thank you.

CodePudding user response:

Consider full join on two aggregates

SELECT page_visits.page_id 
     , COALESCE(vsts.cmt_cnt, 0) AS visitCount 
     , COALESCE(cmts.cmt_cnt, 0) AS commentCount
FROM (
    SELECT page_id
         , COUNT(*) AS vst_cnt 
    FROM page_visits 
    WHERE created_at >= NOW() - INTERVAL '1 HOUR' 
    GROUP BY page_id
   ) AS vsts
FULL OUTER JOIN ( 
    SELECT page_id 
         , COUNT(*) AS cmt_cnt 
    FROM comments
    WHERE created_at >= NOW() - INTERVAL '1 HOUR'
    GROUP BY page_id 
   ) AS cmts
 ON cmts.page_id = vsts.page_id

Alternatively, aggregate a UNION query of both tables:

SELECT page_id
     , SUM(vst_n) AS vst_cnt 
     , SUM(cmt_n) AS cmt_cnt
FROM (
    SELECT page_id, 1 AS vst_n, 0 AS cmt_n
    FROM page_visits 
    WHERE created_at >= NOW() - INTERVAL '1 HOUR'
 
    UNION ALL

    SELECT page_id, 0 AS vst_n, 1 AS cmt_n
    FROM comments 
    WHERE created_at >= NOW() - INTERVAL '1 HOUR' 
   ) AS sub
GROUP BY page_id

Regarding last question, if I understand you, simply run an insert-select query from above query. Not quite sure how you aggregated created_at, but add a MIN or MAX to above aggregations and include an additional column below:

INSERT INTO page_data (page_id, visit_count, comment_count)
SELECT ...above query...
  • Related