Home > Software design >  Right join to an incremental date table not working in SQL
Right join to an incremental date table not working in SQL

Time:10-05

I have two tables. The first is a date_incremental table just holing dates from 1900-01-01 to current in yyyy-mm-dd format. The second table has user behavior. Here are the tables in visual form:

date_incremental

date 
1900-01-01
....
2022-10-05

The user table

date       | user | purchase | country
2020-01-01 | 1    | 10       | US
2020-01-04 | 1    | 8        | US
2020-01-08 | 1    | 2        | US
2020-02-03 | 2    | 45       | GER
2020-02-05 | 2    | 81       | GER
2020-02-06 | 2    | 22       | GER

Now, I want to get a table output where no date gaps exist between the users purchases, i.e. display the history subsequently. If there a days where a user did not purchase anything, the purchase column should be 0 whereas the country column should be the last non-null value. Here is the output I am looking for:

date       | user | purchase | country
2020-01-01 | 1    | 10       | US
2020-01-02 | 1    | 0        | US NEW INSERTED ROW
2020-01-03 | 1    | 0        | US NEW INSERTED ROW
2020-01-04 | 1    | 8        | US
2020-01-05 | 1    | 0        | US NEW INSERTED ROW
2020-01-06 | 1    | 0        | US NEW INSERTED ROW
2020-01-07 | 1    | 0        | US NEW INSERTED ROW
2020-01-08 | 1    | 2        | US

2020-02-03 | 2    | 45       | GER
2020-02-04 | 2    | 0        | GER NEW INSERTED ROW
2020-02-05 | 2    | 81       | GER
2020-02-06 | 2    | 22       | GER

I tried the following query but didnt get this output:

Select a.* 
from user_tbl a 
right join date_incremental b 
on a.date=b.date
  1. My output just returns the exact same user_table output for some reason. How can I fix this?
  2. How do I specify that I want to use the last non-null values when filling the gaps for all non-numeric columns. i.e. Country and fill 0 for numeric values?

Any advise is super appreciated.

CodePudding user response:

So you have to fetch the "window of dates" per user, and bind those for each user, so you can display those results, I do this with yet another cte.

Then you can do the LEFT JOIN to bind the data, and LAG over this

with date_incremental (date) as (
    select * from values
    ('2019-12-31'::date), 
    ('2020-01-01'::date), 
    ('2020-01-02'::date), 
    ('2020-01-03'::date), 
    ('2020-01-04'::date), 
    ('2020-01-05'::date), 
    ('2020-01-06'::date), 
    ('2020-01-07'::date), 
    ('2020-01-08'::date), 
    ('2020-01-09'::date), 
    ('2020-02-02'::date), 
    ('2020-02-03'::date), 
    ('2020-02-04'::date), 
    ('2020-02-05'::date), 
    ('2020-02-06'::date),
    ('2020-02-07'::date)
), user(date,user,purchase,country) as (
    select * from values
    ('2020-01-01'::date, 1, 10, 'US'),
    ('2020-01-04'::date, 1, 8, 'US'),
    ('2020-01-08'::date, 1, 2, 'US'),
    ('2020-02-03'::date, 2, 45, 'GER'),
    ('2020-02-05'::date, 2, 81, 'GER'),
    ('2020-02-06'::date, 2, 22, 'GER')
), enriched_user as (
    select user
        ,min(date) as min_date
        ,max(date) as max_date
    from user
    group by 1
)
select
    d.date
    ,e.user
    ,zeroifnull(a.purchase) as purchase
    ,nvl(a.country, lag(a.country) ignore nulls over (partition by e.user order by d.date)) as country
from enriched_user as e
join date_incremental as d
    on d.date between e.min_date and e.max_date
left join user as a
    on a.user = e.user and a.date = d.date
order by 2, 1;

gives:

DATE USER PURCHASE COUNTRY
2020-01-01 1 10 US
2020-01-02 1 0 US
2020-01-03 1 0 US
2020-01-04 1 8 US
2020-01-05 1 0 US
2020-01-06 1 0 US
2020-01-07 1 0 US
2020-01-08 1 2 US
2020-02-03 2 45 GER
2020-02-04 2 0 GER
2020-02-05 2 81 GER
2020-02-06 2 22 GER
  • Related