I have product data in my table which looks similar to this
product_id | user_id | sales_start | sales_end | quantity |
---|---|---|---|---|
1 | 12 | 2022-01-01 | 2022-02-01 | 15 |
2 | 234 | 2022-11-01 | 2022-12-31 | 123 |
I want to transform the table into a daily snapshot so that it would look something like this:
product_id | user_id | quantity | date |
---|---|---|---|
1 | 12 | 15 | 2022-01-01 |
1 | 12 | 15 | 2022-01-02 |
1 | 12 | 15 | 2022-01-03 |
... | ... | ... | ... |
2 | 234 | 123 | 2022-12-31 |
I know how to do a similar thing in Pandas, but I need to do it within AWS Athena. I thought of getting the date interval and unnest it, but I am struggling with mapping them properly.
Any ideas on how to transform data?
CodePudding user response:
This will help you sequence
SELECT product_id, user_id, quantity, date(date) as date FROM(
VALUES
(1, 12, DATE '2022-01-01', DATE '2022-02-01', 15),
(2, 234, DATE '2022-11-01', DATE '2022-12-31', 123)
) AS t (product_id, user_id, sales_start, sales_end, quantity),
UNNEST(sequence(sales_start, sales_end, interval '1' day)) t(date)
CodePudding user response:
You can use sequnece
to generate dates range and then unnest
it:
-- sample data
with dataset(product_id, user_id, sales_start, sales_end, quantity) as (
values (1, 12 , date '2022-01-01', date '2022-01-05', 15), -- short date ranges
(2, 234, date '2022-11-01', date '2022-11-03', 123) -- short date ranges
)
-- query
select product_id, user_id, quantity, date
from dataset,
unnest(sequence(sales_start, sales_end, interval '1' day)) as t(date);
Output:
product_id | user_id | quantity | date |
---|---|---|---|
1 | 12 | 15 | 2022-01-01 |
1 | 12 | 15 | 2022-01-02 |
1 | 12 | 15 | 2022-01-03 |
1 | 12 | 15 | 2022-01-04 |
1 | 12 | 15 | 2022-01-05 |
2 | 234 | 123 | 2022-11-01 |
2 | 234 | 123 | 2022-11-02 |
2 | 234 | 123 | 2022-11-03 |