I have Table 1 and Table 2, where Table 1 has unique customer id and their report date and Table 2 has the entire date range and corresponding values for each date for each customer id. Basically, I want to pull 2 dates before the report date and 2 dates after the report date for each customer id to get the Desired Output Table below. SQL - pull data for date range based on a start date seems to be the closest to what I want, but I don't know how to tweak the answer to suit my requirement.
Below is my code, but it does not work in Hive/Hue SQL. Appreciate your help :)
SELECT b.date, a.report_date, a.customer_id, b.value
FROM table1 a
LEFT JOIN table2 b ON b.date = a.report_date AND b.customer_id = a.customer_id
WHERE b.date IN (DATE_ADD(a.report_date, 2)
OR b.date IN (DATE_SUB(a.report_date, 2)
TABLE 1
customer id | report date |
---|---|
123 | 15/01/2021 |
456 | 20/02/2021 |
TABLE 2
customer id | date | value |
---|---|---|
... | ... | ... |
123 | 13/01/2021 | 10 |
123 | 14/01/2021 | 13 |
123 | 15/01/2021 | 9 |
123 | 16/01/2021 | 19 |
123 | 17/01/2021 | 20 |
... | ... | ... |
... | ... | ... |
456 | 18/02/2021 | 4 |
456 | 19/02/2021 | 5 |
456 | 20/02/2021 | 2 |
456 | 21/02/2021 | 9 |
456 | 22/02/2021 | 1 |
... | ... | ... |
DESIRED OUTPUT TABLE
date | report date | customer id | value |
---|---|---|---|
13/01/2021 | 15/01/2021 | 123 | 10 |
14/01/2021 | 15/01/2021 | 123 | 13 |
15/01/2021 | 15/01/2021 | 123 | 9 |
16/01/2021 | 15/01/2021 | 123 | 19 |
17/01/2021 | 15/01/2021 | 123 | 20 |
18/02/2021 | 20/02/2021 | 456 | 4 |
19/02/2021 | 20/02/2021 | 456 | 5 |
20/02/2021 | 20/02/2021 | 456 | 2 |
21/02/2021 | 20/02/2021 | 456 | 9 |
22/02/2021 | 20/02/2021 | 456 | 1 |
CodePudding user response:
The problem is in the JOIN segment: you need to JOIN by customer_id and not dates:
ON b.customer_id = a.customer_id
CodePudding user response:
Shouldn't it be
SELECT b.date, a.report_date, a.customer_id, b.value
FROM table1 a
LEFT JOIN table2 b ON b.customer_id = a.customer_id
WHERE
b.date BETWEEN DATE_SUB(a.report_date, 2)
AND DATE_ADD(a.report_date, 2)
I don't really know HIVE SQL but IN
is a set/list operator, so you would be looking for the two dates, but the closing brackets are missing, too.
CodePudding user response:
It's not clear that you actually need an outer join. When you do use one you can't filter on the inner table afterward though.
LEFT JOIN table2 b ON
b.customer_id = a.customer_id
AND b.date BETWEEN DATE_SUB(a.report_date, 2) AND DATE_ADD(a.report_date, 2)