I am trying to retrieve records based on a custom field "ci_ku". For the same values of "ci_ku" we will be having multiple "l1m_visits", and I want to retrieve the minimum value of "l1mvisits" for each "ci_ku".
Sample Data:
ku | ci_ku | l1m_visits |
---|---|---|
1234-5678-HIJK | 1234-HIJK | A |
1234-9012-HIJK | 1234-HIJK | B |
Expected Output:
ku | ci_ku | l1m_visits |
---|---|---|
1234-5678-HIJK | 1234-HIJK | A |
Have tried the query below:
SELECT DISTINCT REGEXP_REPLACE(ku, CONCAT('-',CAST(v_nbr AS varchar)), '') AS ci_ku,
ku,
MIN(l1m_visits),
last_refresh_date
FROM db.schema.table
GROUP BY ci_ku;
and facing the following error:
line 1:194: Column 'ci_ku' cannot be resolved
CodePudding user response:
That error is fired because the field "ci_ku" is not yet generated when the GROUP BY
clause is evaluated. Further there are some more issues in your query:
- not all non-aggregated rows are found within the
GROUP BY
clause ("ku" and "last_refresh_date" should be included) - the
DISTINCT
keyword will remove duplicate rows, though there are none after yourSELECT
statement.
Instead of using aggregation, the ROW_NUMBER
window function may get your result faster. It will generate an incremental number for each of your "ci_ku" values (PARTITION BY ci_ku
) and ordered by "l1m_visits" (ORDER BY ci_ku
), such that your row number equal to 1 will represent the lowest "l1m_visits" for each "ci_ku".
WITH tab_with_ci_ku AS (
SELECT REGEXP_REPLACE(ku, CONCAT('-',CAST(v_nbr AS varchar)), '') AS ci_ku,
ku,
l1m_visits,
last_refresh_date
FROM db.schema.table
), ranked_visits AS (
SELECT *, ROW_NUMBER() OVER(PARTITION BY ci_ku ORDER BY l1m_visits) AS rn
FROM tab_with_ci_ku
)
SELECT ku,
ci_ku,
l1m_visits
FROM ranked_visits
WHERE rn = 1
If you're using PostgreSQL, you can also use the FETCH n ROWS WITH TIES
clause that retrieves the first row for each tied row number (it will pick the each row number = 1):
WITH tab_with_ci_ku AS (
SELECT REGEXP_REPLACE(ku, CONCAT('-',CAST(v_nbr AS varchar)), '') AS ci_ku,
ku,
l1m_visits,
last_refresh_date
FROM db.schema.table
)
SELECT ku,
ci_ku,
l1m_visits
FROM ranked_visits
ORDER BY ROW_NUMBER() OVER(PARTITION BY ci_ku ORDER BY l1m_visits)
FETCH FIRST 1 ROWS WITH TIES;