Home > Enterprise >  SQL (bigquery) Repeat values from previous day if they don't exist already
SQL (bigquery) Repeat values from previous day if they don't exist already

Time:07-03

I have a table that includes users that created an event in the app and they produced some revenue based on this event (it's added cummulatively every day):

date user_id revenue
2022-04-01 A 0.5
2022-04-01 B 0.3
2022-04-01 C 0.7
2022-04-02 B 0.6
2022-04-02 C 0.9
2022-04-03 C 1.2

What I want to do is use the data about all the users from the first day, but if they don't bring any revenue, I would like to use the revenue value for this user from the previous day, like so:

date user_id revenue
2022-04-01 A 0.5
2022-04-01 B 0.3
2022-04-01 C 0.7
2022-04-02 A 0.5
2022-04-02 B 0.6
2022-04-02 C 0.9
2022-04-03 A 0.5
2022-04-03 B 0.6
2022-04-03 C 1.2

My initial idea was to somehow copy first day user_id's and leave the revenue value for this day as null:

date user_id revenue
2022-04-01 A 0.5
2022-04-01 B 0.3
2022-04-01 C 0.7
2022-04-02 A null
2022-04-02 B 0.6
2022-04-02 C 0.9
2022-04-03 A null
2022-04-03 B null
2022-04-03 C 1.2

Then I would use this to find the right values to fill NULLs with

SELECT date, 
       user_id, 
       revenue, 
       LAST_VALUE(revenue, IGNORE NULLS) as last_values
FROM table

So the question is, how do I go about "copying" my first day users to every following day in the table? Maybe, there is a better solution than the one I've thought about?

CodePudding user response:

The problem of this task is generating an output that contains the cartesian product of "user_id" values with "date" values. One option is to generate the empty cartesian product first, then LEFT JOIN with your original table and fill the NULL values using a window function.

Instead of the LAST_VALUE function, you could use the MAX window function and limit its frame till the current row. Given that your revenue values are cumulative, you should get exactly the last non-null value as a correct output.

WITH cte AS (
    SELECT *
    FROM       (SELECT DISTINCT date    FROM tab) dates
    INNER JOIN (SELECT DISTINCT user_id FROM tab) users 
            ON 1=1
)
SELECT cte.user_id,
       cte.date,
       COALESCE(tab.revenue, 
                MAX(revenue) OVER(PARTITION BY user_id 
                                  ORDER     BY date   
                                  ROWS UNBOUNDED PRECEDING)) AS revenue
FROM      cte
LEFT JOIN tab 
       ON cte.user_id = tab.user_id
      AND cte.date    = tab.date
ORDER BY cte.user_id,
         cte.date

CodePudding user response:

Consider below approach

select date, user_id, 
  first_value(revenue ignore nulls) over prev_values as revenue
from (select distinct date from your_table),
  (select distinct user_id from your_table)
left join your_table
using(date, user_id)
window prev_values as (
  partition by user_id order by date desc 
  rows between current row and unbounded following
)
order by date, user_id    

if applied to sample data in y our question - output is

enter image description here

  • Related