Home > Software engineering >  Return number of rows (streak count) from entries where each entry is the previous days date
Return number of rows (streak count) from entries where each entry is the previous days date

Time:10-12

I have a table with two columns, userid and date. I want to query this table using a specific userid and date, and from that I want to return the row count of entries going backwards from the entered date. i.e. where userid = 1 AND date = '2020-07-09'. It should stop counting if there is a gap between the next date.

So here's my table:

userid | date
-------------------
1     | 2020-07-27
1     | 2020-07-28
2     | 2020-07-28
1     | 2020-07-29

The streak for userid 1 and date 2020-07-29 would be 3.

Now if I remove an entry:

userid | date
-------------------
1     | 2020-07-27
2     | 2020-07-28
1     | 2020-07-29

The streak for userid 1 and date 2020-07-29 would be 1. This is because the 2020-07-28 date is missing for the userid.

How could I do this with postgres sql? I have looked into the generate_series function but this requires me to set a start and end date.

CodePudding user response:

You could achieve this using the following:

Approach 1

Using window functions, you could achieve this eg

SELECT
    MAX("date") - MIN("date")    1  as streak
FROM (
    SELECT
        *,
        SUM(date_cont) OVER (PARTITION BY "userid" ORDER BY "date" ASC) as gn
    FROM (
        SELECT 
            * ,
            CASE
                WHEN "date"-LAG("date",1,("date"- interval '1 day')::date) OVER (
                       PARTITION BY  "userid"
                       ORDER BY "date"
                ) =1 THEN 0
                ELSE 1
             END as date_cont
        FROM t 
        WHERE "userid"=1 AND "date" <= '2020-07-29' 
        
    ) t1
) t2 
GROUP BY gn, "userid"
ORDER BY gn DESC
LIMIT 1

or replace MAX("date") - MIN("date") 1 with COUNT(1), or if you would like the entire row data

SELECT
    *,
    MAX("date") OVER (PARTITION BY gn,"userid") - 
    MIN("date") OVER (PARTITION BY gn,"userid")   1  as streak
FROM (
    SELECT
        *,
        SUM(date_cont) OVER (PARTITION BY "userid" ORDER BY "date" ASC) as gn
    FROM (
        SELECT 
            * ,
            CASE
                WHEN "date"-LAG("date",1,("date"- interval '1 day')::date) OVER (
                       PARTITION BY  "userid"
                       ORDER BY "date"
                ) =1 THEN 0
                ELSE 1
             END as date_cont
        FROM t 
        WHERE "userid"=1 AND "date" <= '2020-07-29' 
        
    ) t1
) t2 
ORDER BY gn DESC
LIMIT 1

or replace MAX("date") OVER (PARTITION BY gn,"userid") - MIN("date") OVER (PARTITION BY gn,"userid") 1 with COUNT(1) OVER (PARTITION BY gn,"userid").

NB. Since we have filtered based on userid we could simple partition by gn only

Approach 2

Create a function to extract the streak. This function loops through the data and breaks when it determines that the streak has been broken.

CREATE OR REPLACE FUNCTION getStreak(
    user_id int,
    start_date DATE
) RETURNS int AS
$BODY$
DECLARE
    r  RECORD; 
BEGIN
    FOR r IN (
      SELECT 
          * ,
          ROW_NUMBER() OVER (
              PARTITION BY "userid"
              ORDER BY "date" DESC
          ) as streak,
          "date"-LAG("date",1,"date") OVER (
              PARTITION BY  "userid"
              ORDER BY "date"
          ) as date_diff
      FROM t 
      WHERE "userid"=user_id AND "date" <= start_date 
      ORDER BY "date" DESC
    )
    LOOP
        IF r.date_diff > 1 THEN
            RETURN r.streak;        
        END IF;
    END LOOP;
    RETURN COALESCE(r.streak,0);
END;
$BODY$
LANGUAGE plpgsql;

with example usage

SELECT getStreak(1,'2020-07-29');

Approach 3

This approach identifies the streak using the difference from the chosen date and row number

SELECT 
    MAX(rn) as streak
FROM (
        SELECT 
            * ,
            MIN('2020-07-29'::date- "date")  OVER (PARTITION BY "userid") as earliest_diff,
            ROW_NUMBER() OVER (PARTITION BY "userid" ORDER BY "date" DESC) as rn,
            ('2020-07-29'::date- "date") as diff
        FROM t 
        WHERE "date" <='2020-07-29' AND "userid"=1
) t1
WHERE rn = (diff-earliest_diff 1) 

Demo

You may view a working demo with test cases here

Let me know if this works for you.

  • Related