How to Shorten Execution Time for A View-CodePudding

I have 3 tables, a user table, an admin table, and a cust table. Both admin and cust tables are foreign keyed to the user_account table. Basically, every user has a user record, and the type of user they are is determined by if they have a record in the admin or the cust table.

user                   admin                   cust
 user_id         user_id | admin_id      user_id | cust_id
---------       ---------|----------    ---------|---------
 1               1       | a             2       | dd
 2               4       | b             3       | ff
 3            
 4

Then I have a login_history table that records the user_id and login timestamp every time a user logs into the app

  login_history
 user_id | login_on
---------|---------------------
 1       | 2022-01-01 13:22:43
 1       | 2022-01-02 16:16:27
 3       | 2022-01-05 21:17:52
 2       | 2022-01-11 11:12:26
 3       | 2022-01-12 03:34:47

I would like to create a view that would contain all dates for the first day of each week in the year starting from jan 1st, and a count column that contains the count of unique admin users that logged in that week and a count of unique cust users that logged in that week. So the resulting view should contain the following 53 records, one for each week.

login_counts_view
 week_start_date | admin_count | cust_count
-----------------|-------------|------------
 2022-01-01      | 1           | 1
 2022-01-08      | 0           | 2
 2022-01-15      | 0           | 0
 .
 .
 .
 2022-12-31      | 0           | 0

Note that the first week (2022-01-01) only has 1 count for admin_count even though the admin with user_id 1 logged in twice that week.

Below is the current query I have for the view. However, the tables are pretty large and it takes over 10 seconds to retrieve all records from the view, mainly because of the left joined date comparisons.

CREATE VIEW login_counts_view  AS
SELECT 
    week_start_dates.week_start_date::text AS week_start_date,
    count(distinct a.user_id) AS admin_count,
    count(distinct c.user_id) AS cust_count
FROM (
    SELECT 
        to_char(i::date, 'YYYY-MM-DD') AS week_start_date 
    FROM 
        generate_series(date_trunc('year', NOW()), to_char(NOW(), 'YYYY-12-31')::date, '1 week') i
) week_start_dates

LEFT JOIN login_history l ON l.login_on::date BETWEEN week_start_dates.week_start_date::date AND (week_start_dates.week_start_date::date   INTERVAL '6 day')::date
LEFT JOIN admin a ON a.user_id = l.user_id 
LEFT JOIN cust c ON c.user_id = l.user_id 
GROUP BY week_start_date;

Does anyone have any tips as to how to make this query execute more efficiently?

CodePudding user response：

Idea

Compute the pseudo-week of each login date: partition the year into 7-day slices and number them consecutively. The pseudo-week of a given date would be the ordinal number of the slice it falls into.

Then operate the joins on integers representing the pseudo-weeks instead of date values and comparisons.

Implementation

A view to implement this follows:

CREATE VIEW login_counts_view_fast  AS
          WITH RECURSIVE Numbers(i) AS ( SELECT 0 UNION ALL SELECT i   1 FROM Numbers WHERE i < 52 )     
        SELECT CAST ( date_trunc('year', NOW()) AS DATE)   7 * n.i  week_start_date
             , count(distinct lw.admin_id)                          admin_count
             , count(distinct lw.cust_id)                           cust_count
          FROM (
                     SELECT i FROM Numbers 
               ) n
     LEFT JOIN (
                     SELECT admin_id
                          , cust_id
                          , base
                          , pit
                          , pit-base                     delta
                          , (pit-base) / (3600 * 24 * 7) week  
                       FROM (
                                  SELECT a.user_id  admin_id
                                       , c.user_id  cust_id
                                       , CAST ( EXTRACT ( EPOCH FROM l.login_on )                 AS INTEGER ) pit
                                       , CAST ( EXTRACT ( EPOCH FROM date_trunc('year', NOW())  ) AS INTEGER ) base
                                    FROM login_history l
                               LEFT JOIN admin         a ON a.user_id = l.user_id
                               LEFT JOIN cust          c ON c.user_id = l.user_id  
                            ) le
               ) lw
                        ON lw.week = n.i
      GROUP BY n.i
;

Some remarks:

The epoch values are the number of seconds elapsed since an absolute base datetime (specifically 1/1/1970 0h00).
CASTS are necessary to convert doubles to integers and timestamps to dates as mandated by the signatures of postgresql date functions and in order to enforce integer arithmetics.
The recursive subquery is a generator of consecutive integers. It could possibly be replaced by a generate_series call (untested)

Evaluation

See it in action in this db fiddle

The query plan indicates savings of 50-70% in execution time.