Home > Mobile >  SQL to UNION ALL n times depending on rows of table
SQL to UNION ALL n times depending on rows of table

Time:09-17

I have a table that has quiz entries submitted by users. Each row has: date, id, source and a score.

I want to create a report, per source, that calculates the average score for each day taking into account all users, even the ones that didn't reply on that day by considering their latest response.

Right now I have one query that can simply pull all quiz responses for a given day, source and user - if a user submitted multiple quizzes on the same day the latest one is retrieved:

Query

      select
        id
        score,
        submission_date,
        source
      from
        quiz
      WHERE
        id != ""
      qualify ROW_NUMBER() OVER(PARTITION BY source, submission_date), id ORDER BY submission_date DESC) = 1;

Now, for each row that is returned from the query above I want to UNION ALL with a similar query as above but where the submission_date, instead of being the latest, should be <= the submission_date of the row it is going to 'union' with.

In another words, the query above could return 'n' rows, each row from table A should have UNION ALL 'n' times with a query that will base itself on the row submission date it is going to UNION ALL with:

Table A:
row 1
row 2
row 3
...
row n

Final table:
row 1
(multiple rows from UNION ALL)
row 2
(multiple rows from UNION ALL)
row 3
(multiple rows from UNION ALL)
...
row n
(multiple rows from UNION ALL)

I don't know if UNION ALL is the solution here but this is what I could think of.

Edit

This is the output of the query above that reads from table named quiz:

id score submission_date source
user1 30 2022-09-16 foxmedia
user2 29 2022-09-16 foxmedia
user3 44 2022-09-14 foxmedia
user4 58 2022-09-13 foxmedia
user5 94 2022-09-13 branding
user2 25 2022-09-11 foxmedia
user1 21 2022-09-11 foxmedia
user4 50 2022-09-10 foxmedia
user2 23 2022-09-10 foxmedia
user1 22 2022-09-10 foxmedia
user5 90 2022-09-09 branding

As you can see, on 2022-09-16 user1 and user2 have submitted responses but the average for this day should consider the last response from all other users on foxmedia as if they answered the quiz on that same day, which means including on the average calculation rows from user3 on 2022-09-14 and user4 on 2022-09-13. user5 should not count for the average as the source is not branding but foxmedia which should have a separate average, which in this case would only include data from user5.

Any help would be appreciated.

Thanks!

CodePudding user response:

  • You have scores for each user, but only for some days.
  • For each day you want to have an entry for all users.
  • The score for each user is the score of the day. If on a day an user has no score, take the last value from the past.

We start building some sample data. Then we cross join this with every date all_dates in the dataset. Next we group the data by the all_dates and user id. For obtaining the last entry of each user, an array of the scores is build. The if case is needed to not use score data from the future.

with tbl as 
(
Select "user1" as id, 30 as score, date("2022-09-16") as submission_date, "foxmedia" as S,
 union all Select "user2", 29 ,date("2022-09-16"), "foxmedia",
 union all Select "user3", 44 ,date("2022-09-14"), "foxmedia",
 union all Select "user4", 58 ,date("2022-09-13"), "foxmedia",
 union all Select "user5", 94 ,date("2022-09-13"), "branding",
 union all Select "user2", 25 ,date("2022-09-11"), "foxmedia",
 union all Select "user1", 21 ,date("2022-09-11"), "foxmedia",
 union all Select "user4", 50 ,date("2022-09-10"), "foxmedia",
 union all Select "user2", 23 ,date("2022-09-10"), "foxmedia",
 union all Select "user1", 22 ,date("2022-09-10"), "foxmedia",
 union all Select "user5", 90 ,date("2022-09-09"), "branding"
)
select 
S,
all_dates,
id,
array_agg(submission_date order by submission_date desc limit 1)[safe_offset(0)] as last_date,
array_agg(if(submission_date<=all_dates,score,null) ignore nulls order by submission_date desc limit 1)[safe_offset(0)] as last_score,

from tbl
cross join (select distinct submission_date as all_dates from tbl)
group by 1,2,3
order by 1 desc,2 desc,3 

  • Related