I have a table with two columns, userid and date. I want to query this table using a specific userid and date, and from that I want to return the row count of entries going backwards from the entered date. i.e. where userid = 1 AND date = '2020-07-09'
. It should stop counting if there is a gap between the next date.
So here's my table:
userid | date
-------------------
1 | 2020-07-27
1 | 2020-07-28
2 | 2020-07-28
1 | 2020-07-29
The streak for userid 1
and date 2020-07-29
would be 3.
Now if I remove an entry:
userid | date
-------------------
1 | 2020-07-27
2 | 2020-07-28
1 | 2020-07-29
The streak for userid 1
and date 2020-07-29
would be 1. This is because the 2020-07-28 date is missing for the userid.
How could I do this with postgres sql? I have looked into the generate_series
function but this requires me to set a start and end date.
CodePudding user response:
You could achieve this using the following:
Approach 1
Using window functions, you could achieve this eg
SELECT
MAX("date") - MIN("date") 1 as streak
FROM (
SELECT
*,
SUM(date_cont) OVER (PARTITION BY "userid" ORDER BY "date" ASC) as gn
FROM (
SELECT
* ,
CASE
WHEN "date"-LAG("date",1,("date"- interval '1 day')::date) OVER (
PARTITION BY "userid"
ORDER BY "date"
) =1 THEN 0
ELSE 1
END as date_cont
FROM t
WHERE "userid"=1 AND "date" <= '2020-07-29'
) t1
) t2
GROUP BY gn, "userid"
ORDER BY gn DESC
LIMIT 1
or replace MAX("date") - MIN("date") 1
with COUNT(1)
,
or if you would like the entire row data
SELECT
*,
MAX("date") OVER (PARTITION BY gn,"userid") -
MIN("date") OVER (PARTITION BY gn,"userid") 1 as streak
FROM (
SELECT
*,
SUM(date_cont) OVER (PARTITION BY "userid" ORDER BY "date" ASC) as gn
FROM (
SELECT
* ,
CASE
WHEN "date"-LAG("date",1,("date"- interval '1 day')::date) OVER (
PARTITION BY "userid"
ORDER BY "date"
) =1 THEN 0
ELSE 1
END as date_cont
FROM t
WHERE "userid"=1 AND "date" <= '2020-07-29'
) t1
) t2
ORDER BY gn DESC
LIMIT 1
or replace MAX("date") OVER (PARTITION BY gn,"userid") - MIN("date") OVER (PARTITION BY gn,"userid") 1
with COUNT(1) OVER (PARTITION BY gn,"userid")
.
NB. Since we have filtered based on userid
we could simple partition by gn
only
Approach 2
Create a function to extract the streak. This function loops through the data and breaks when it determines that the streak has been broken.
CREATE OR REPLACE FUNCTION getStreak(
user_id int,
start_date DATE
) RETURNS int AS
$BODY$
DECLARE
r RECORD;
BEGIN
FOR r IN (
SELECT
* ,
ROW_NUMBER() OVER (
PARTITION BY "userid"
ORDER BY "date" DESC
) as streak,
"date"-LAG("date",1,"date") OVER (
PARTITION BY "userid"
ORDER BY "date"
) as date_diff
FROM t
WHERE "userid"=user_id AND "date" <= start_date
ORDER BY "date" DESC
)
LOOP
IF r.date_diff > 1 THEN
RETURN r.streak;
END IF;
END LOOP;
RETURN COALESCE(r.streak,0);
END;
$BODY$
LANGUAGE plpgsql;
with example usage
SELECT getStreak(1,'2020-07-29');
Approach 3
This approach identifies the streak using the difference from the chosen date and row number
SELECT
MAX(rn) as streak
FROM (
SELECT
* ,
MIN('2020-07-29'::date- "date") OVER (PARTITION BY "userid") as earliest_diff,
ROW_NUMBER() OVER (PARTITION BY "userid" ORDER BY "date" DESC) as rn,
('2020-07-29'::date- "date") as diff
FROM t
WHERE "date" <='2020-07-29' AND "userid"=1
) t1
WHERE rn = (diff-earliest_diff 1)
Demo
You may view a working demo with test cases here
Let me know if this works for you.