I have a table that has quiz entries submitted by users. Each row has: date, id, source and a score.
I want to create a report, per source, that calculates the average score for each day taking into account all users, even the ones that didn't reply on that day by considering their latest response.
Right now I have one query that can simply pull all quiz responses for a given day, source and user - if a user submitted multiple quizzes on the same day the latest one is retrieved:
Query
select
id
score,
submission_date,
source
from
quiz
WHERE
id != ""
qualify ROW_NUMBER() OVER(PARTITION BY source, submission_date), id ORDER BY submission_date DESC) = 1;
Now, for each row that is returned from the query above I want to UNION ALL with a similar query as above but where the submission_date, instead of being the latest, should be <= the submission_date of the row it is going to 'union' with.
In another words, the query above could return 'n' rows, each row from table A should have UNION ALL 'n' times with a query that will base itself on the row submission date it is going to UNION ALL with:
Table A:
row 1
row 2
row 3
...
row n
Final table:
row 1
(multiple rows from UNION ALL)
row 2
(multiple rows from UNION ALL)
row 3
(multiple rows from UNION ALL)
...
row n
(multiple rows from UNION ALL)
I don't know if UNION ALL is the solution here but this is what I could think of.
Edit
This is the output of the query above that reads from table named quiz:
id | score | submission_date | source |
---|---|---|---|
user1 | 30 | 2022-09-16 | foxmedia |
user2 | 29 | 2022-09-16 | foxmedia |
user3 | 44 | 2022-09-14 | foxmedia |
user4 | 58 | 2022-09-13 | foxmedia |
user5 | 94 | 2022-09-13 | branding |
user2 | 25 | 2022-09-11 | foxmedia |
user1 | 21 | 2022-09-11 | foxmedia |
user4 | 50 | 2022-09-10 | foxmedia |
user2 | 23 | 2022-09-10 | foxmedia |
user1 | 22 | 2022-09-10 | foxmedia |
user5 | 90 | 2022-09-09 | branding |
As you can see, on 2022-09-16 user1 and user2 have submitted responses but the average for this day should consider the last response from all other users on foxmedia as if they answered the quiz on that same day, which means including on the average calculation rows from user3 on 2022-09-14 and user4 on 2022-09-13. user5 should not count for the average as the source is not branding but foxmedia which should have a separate average, which in this case would only include data from user5.
Any help would be appreciated.
Thanks!
CodePudding user response:
- You have scores for each user, but only for some days.
- For each day you want to have an entry for all users.
- The score for each user is the score of the day. If on a day an user has no score, take the last value from the past.
We start building some sample data. Then we cross join this with every date all_dates
in the dataset. Next we group the data by the all_dates
and user id. For obtaining the last entry of each user, an array of the scores is build. The if
case is needed to not use score data from the future.
with tbl as
(
Select "user1" as id, 30 as score, date("2022-09-16") as submission_date, "foxmedia" as S,
union all Select "user2", 29 ,date("2022-09-16"), "foxmedia",
union all Select "user3", 44 ,date("2022-09-14"), "foxmedia",
union all Select "user4", 58 ,date("2022-09-13"), "foxmedia",
union all Select "user5", 94 ,date("2022-09-13"), "branding",
union all Select "user2", 25 ,date("2022-09-11"), "foxmedia",
union all Select "user1", 21 ,date("2022-09-11"), "foxmedia",
union all Select "user4", 50 ,date("2022-09-10"), "foxmedia",
union all Select "user2", 23 ,date("2022-09-10"), "foxmedia",
union all Select "user1", 22 ,date("2022-09-10"), "foxmedia",
union all Select "user5", 90 ,date("2022-09-09"), "branding"
)
select
S,
all_dates,
id,
array_agg(submission_date order by submission_date desc limit 1)[safe_offset(0)] as last_date,
array_agg(if(submission_date<=all_dates,score,null) ignore nulls order by submission_date desc limit 1)[safe_offset(0)] as last_score,
from tbl
cross join (select distinct submission_date as all_dates from tbl)
group by 1,2,3
order by 1 desc,2 desc,3