I have a problem like this in SQL Server. Here is the base table:
submission_date | submission_id | hacker_id | score |
---|---|---|---|
2016-03-01 | 11 | 1 | 0 |
2016-03-01 | 12 | 3 | 15 |
2016-03-01 | 13 | 2 | 60 |
2016-03-03 | 14 | 1 | 0 |
2016-03-03 | 15 | 4 | 60 |
2016-03-03 | 16 | 2 | 25 |
2016-03-03 | 17 | 2 | 60 |
2016-03-08 | 18 | 1 | 0 |
2016-03-08 | 19 | 5 | 70 |
Now I want the hacker_id
that made at least 1 submission each day for every day (I will know the Start Date and End Date of the competition). Below is my desired output:
submission_date | submission_id | hacker_id | score |
---|---|---|---|
2016-03-01 | 11 | 1 | 0 |
2016-03-01 | 12 | 3 | 15 |
2016-03-01 | 13 | 2 | 60 |
2016-03-03 | 14 | 1 | 0 |
2016-03-03 | 16 | 2 | 25 |
2016-03-03 | 17 | 2 | 60 |
2016-03-08 | 18 | 1 | 0 |
There are 3 hackers on 2016-03-01, all counted since it's the first day. On the next day ( 2016-03-03), only 1 and 2 keep submitting, and 2 submits two times. On the final day, there is only 1 who keeps submitting.
This is quite similar to the problem on Hackerrank, and there were lots of posts here about it. But my case is different, the date is not continuous day by day like in this post, the day is skipped
I've already solved the continuous day problem by using Recursive CTE like below:
WITH cte([submission_date], [submission_id], [hacker_id], [score], BaseDate) AS
(
SELECT
[submission_date], [submission_id], [hacker_id], [score],
CAST('2016-03-02' AS date) AS BaseDate
FROM
[dbo].[bSubmissions]
WHERE
[submission_date] = '2016-03-01'
UNION ALL
SELECT
S.[submission_date], S.[submission_id], S.[hacker_id],
S.[score], CAST(DATEADD(DAY, 1, C.BaseDate) AS date)
FROM
[dbo].[bSubmissions] AS S
JOIN
cte AS C ON C.hacker_id = S.hacker_id
AND C.BaseDate = S.[submission_date]
)
select * from cte
But it won't work for this case. My idea is still the same, select the hackers that have posted on the previous cte table (all the previous days), add (union) them to the current cte table (by joining with special conditions), and then continue the recursion.
But I don't know how to properly set up this Recursive CTE
.
My other idea is to create a procedure
which takes hacker_id, start_date, current_date
and checks if that hacker posts every day during the period. But let that idea aside. I want to focus on the recursive/normal method first.
Insert data query for anyone who needed:
CREATE TABLE bSubmissions
(
submission_date DATE NOT NULL
,submission_id INTEGER NOT NULL
,hacker_id INTEGER NOT NULL
,score INTEGER NOT NULL
);
INSERT INTO bSubmissions(submission_date,submission_id,hacker_id,score)
VALUES
('2016-03-01',11,1,0)
,('2016-03-01',12,3,15)
,('2016-03-01',13,2,60)
,('2016-03-03',14,1,0)
,('2016-03-03',15,4,60)
,('2016-03-03',16,2,25)
,('2016-03-03',17,2,60)
,('2016-03-08',18,1,0)
,('2016-03-08',19,5,70);
CodePudding user response:
Instead of using a RECURSIVE CTE
, you can use two ranking indices:
- one for each submission date
- one for each hacker submission (partitioned by "hacker_id")
In this way, if a hacker does not submit one day, he won't be considered anymore in the competition (the ranking indices will differ).
In order to make sure to get all indices correctly, we can use the DENSE_RANK
window function, that will assign the same rank to competitions in the same day, yet keeping the rankings consecutive.
WITH cte AS (
SELECT *,
DENSE_RANK() OVER(
ORDER BY submission_date ) AS rn_competition,
DENSE_RANK() OVER(
PARTITION BY hacker_id
ORDER BY submission_date) AS rn_submission
FROM tab
)
SELECT *
FROM cte
WHERE rn_competition = rn_submission
ORDER BY submission_id
Check the demo here.
Note: the last ORDER BY submission_id
clause is not required.