I am trying to retrieve records based on a custom field "ci_ku". For the same values of "ci_ku" we will be having multiple "l1m_visits", and I want to retrieve the minimum value of "l1mvisits" for each "ci_ku". and i want to get the ci_ku by removing the string between 1st and 2nd underscore(-) in ku, how to write a query to get the ci_ku here
Sample Data:
ku | item | l1m_visits |
---|---|---|
1234-5678-HIJK | 1234 | A |
1234-9012-HIJK | 1234 | B |
56457-12456-DF-GH-TC | 56457 | D |
Expected Output:
ku | ci_ku | l1m_visits |
---|---|---|
1234-5678-HIJK | 1234-HIJK | A |
56457-12456-DF-GH-TC | 56457-DF-GH-TC | D |
Have tried the query below:
WITH tab_with_ci_ku AS (
select split(ku, '-', 3)ivm_arr,
l1m_visits,
last_refresh_date
FROM db.scema.table
), ranked_visits AS (
SELECT *, ROW_NUMBER() OVER(PARTITION BY CONCAT(ivm_arr[2],item) as ci_sku ORDER BY l1m_visits) AS rn
FROM tab_with_ci_ku
)
SELECT sku,ci_ku
FROM ranked_visits
WHERE rn = 1
and facing the following error:
mismatched input 'ci_ku'. Expecting: 'ALTER', 'ANALYZE', 'CALL', 'COMMENT', 'COMMIT', 'CREATE', 'DEALLOCATE', 'DELETE', 'DENY', 'DESC', 'DESCRIBE', 'DROP', 'EXECUTE', 'EXPLAIN', 'GRANT', 'INSERT', 'MERGE', 'PREPARE', 'REFRESH', 'RESET', 'REVOKE', 'ROLLBACK', 'SET', 'SHOW', 'START', 'TRUNCATE', 'UPDATE', 'USE', <query>
CodePudding user response:
I would use a regex replacement to handle the ci_ku
requirement. Note that this replacement should happen first, as the call to ROW_NUMBER
depends on it.
WITH tab_with_ci_ku AS (
SELECT *, REGEXP_REPLACE(ku, '([^-] )-[^-] -(.*)', '$1-$2') AS ci_ku
FROM hdpsa30.pps.ivm_bands_boomerang_sos
),
ranked_visits AS (
SELECT *, ROW_NUMBER() OVER (PARTITION BY ci_ku ORDER BY l1m_visits) AS rn
FROM tab_with_ci_ku
)
SELECT ku, ci_ku, l1m_visits
FROM ranked_visits
WHERE rn = 1;
Here is an explanation of the regex:
([^-] )
match and capture in$1
the first term-
match a literal hyphen[^-]
match the second term (to be excluded in the replacement)-
match a literal hyphen(.*)
match and capture in$2
the remainder of the SKU
Then we replace with $1-$2
to effectively splice out the second term.