I have two tables. The first is a date_incremental table just holing dates from 1900-01-01 to current in yyyy-mm-dd format. The second table has user behavior. Here are the tables in visual form:
date_incremental
date
1900-01-01
....
2022-10-05
The user table
date | user | purchase | country
2020-01-01 | 1 | 10 | US
2020-01-04 | 1 | 8 | US
2020-01-08 | 1 | 2 | US
2020-02-03 | 2 | 45 | GER
2020-02-05 | 2 | 81 | GER
2020-02-06 | 2 | 22 | GER
Now, I want to get a table output where no date gaps exist between the users purchases, i.e. display the history subsequently. If there a days where a user did not purchase anything, the purchase column should be 0 whereas the country column should be the last non-null value. Here is the output I am looking for:
date | user | purchase | country
2020-01-01 | 1 | 10 | US
2020-01-02 | 1 | 0 | US NEW INSERTED ROW
2020-01-03 | 1 | 0 | US NEW INSERTED ROW
2020-01-04 | 1 | 8 | US
2020-01-05 | 1 | 0 | US NEW INSERTED ROW
2020-01-06 | 1 | 0 | US NEW INSERTED ROW
2020-01-07 | 1 | 0 | US NEW INSERTED ROW
2020-01-08 | 1 | 2 | US
2020-02-03 | 2 | 45 | GER
2020-02-04 | 2 | 0 | GER NEW INSERTED ROW
2020-02-05 | 2 | 81 | GER
2020-02-06 | 2 | 22 | GER
I tried the following query but didnt get this output:
Select a.*
from user_tbl a
right join date_incremental b
on a.date=b.date
- My output just returns the exact same user_table output for some reason. How can I fix this?
- How do I specify that I want to use the last non-null values when filling the gaps for all non-numeric columns. i.e. Country and fill 0 for numeric values?
Any advise is super appreciated.
CodePudding user response:
So you have to fetch the "window of dates" per user, and bind those for each user, so you can display those results, I do this with yet another cte.
Then you can do the LEFT JOIN to bind the data, and LAG over this
with date_incremental (date) as (
select * from values
('2019-12-31'::date),
('2020-01-01'::date),
('2020-01-02'::date),
('2020-01-03'::date),
('2020-01-04'::date),
('2020-01-05'::date),
('2020-01-06'::date),
('2020-01-07'::date),
('2020-01-08'::date),
('2020-01-09'::date),
('2020-02-02'::date),
('2020-02-03'::date),
('2020-02-04'::date),
('2020-02-05'::date),
('2020-02-06'::date),
('2020-02-07'::date)
), user(date,user,purchase,country) as (
select * from values
('2020-01-01'::date, 1, 10, 'US'),
('2020-01-04'::date, 1, 8, 'US'),
('2020-01-08'::date, 1, 2, 'US'),
('2020-02-03'::date, 2, 45, 'GER'),
('2020-02-05'::date, 2, 81, 'GER'),
('2020-02-06'::date, 2, 22, 'GER')
), enriched_user as (
select user
,min(date) as min_date
,max(date) as max_date
from user
group by 1
)
select
d.date
,e.user
,zeroifnull(a.purchase) as purchase
,nvl(a.country, lag(a.country) ignore nulls over (partition by e.user order by d.date)) as country
from enriched_user as e
join date_incremental as d
on d.date between e.min_date and e.max_date
left join user as a
on a.user = e.user and a.date = d.date
order by 2, 1;
gives:
DATE | USER | PURCHASE | COUNTRY |
---|---|---|---|
2020-01-01 | 1 | 10 | US |
2020-01-02 | 1 | 0 | US |
2020-01-03 | 1 | 0 | US |
2020-01-04 | 1 | 8 | US |
2020-01-05 | 1 | 0 | US |
2020-01-06 | 1 | 0 | US |
2020-01-07 | 1 | 0 | US |
2020-01-08 | 1 | 2 | US |
2020-02-03 | 2 | 45 | GER |
2020-02-04 | 2 | 0 | GER |
2020-02-05 | 2 | 81 | GER |
2020-02-06 | 2 | 22 | GER |