I have the following dataset:
2021 is my base year, and I want to know which of these products that started 2021 also occurred in 2022. So only if a product has a Contract_start in 2021, I want to pull this same product with contract_start in 2022 as well.
My output should therefore look like this
I tried below query. This approach fails to include all four instances of product C, instead it only pulls one product C for 2021 and one for 2022. How to fix this query so that all products starting in 2021 are included in the right number of rows, without causing duplication for the 2022 products?
with values_2021 as
( select distinct
CUSTOMER_NUMBER||PRODUCT as productcust
from table1
where substr(CONTRACT_START,1,4) = '2021'
)
SELECT distinct
CUSTOMER_NUMBER,
TOTAL_DOLLARS,
PRODUCT,
CONTRACT_START,
CONTRACT_END
FROM table1
join values_2021 on (table1.CUSTOMER_NUMBER||table1.PRODUCT) = values_2021.productcust
where CONTRACT_START >= ('20210101')
CodePudding user response:
Not sure how performant this would be with your data, but hopefully it points you in the right direction.
SELECT b.*
FROM (SELECT DISTINCT customer_number,
product
FROM table1
WHERE Substr(contract_start, 1, 4) = '2021') a
inner join table1 b
ON a.customer_number = b.customer_number
AND a.product = b.product
AND b.contract_start >= ( '20210101' )
CodePudding user response:
One option is to extend what you are doing using common table expressions. Create another CTE for 2022. Then, join both back to the main table. As jarlh mentioned, join both columns instead of using string concatenation. Switching the SUBSTR to LIKE could allow an index to be used if one exists on contract_start.
WITH values_2021 AS (
SELECT customer_number
,product
FROM table1
WHERE contract_start LIKE '2021%'
GROUP BY customer_number
,product
),values_2022 AS (
SELECT customer_number
,product
FROM table1
WHERE contract_start LIKE '2022%'
GROUP BY customer_number
,product
)
SELECT t1.customer_number
,t1.total_dollars
,t1.product
,t1.contract_start
,t1.contract_end
FROM table1 t1
JOIN values_2021 v2021 ON ( t1.customer_number = v2021.customer_number
AND t1.product = v2021.product )
JOIN values_2022 v2022 ON ( t1.customer_number = v2022.customer_number
AND t1.product = v2022.product );