I have a table that i would like to filter on a particular data_src column where it can have the following possible values:
data_src data users
srcA x user1
srcB y user1
srcC z user2
NULL NULL user2
Let's say these data_src have different data qualities: srcA > srcB > srcC > NULL
In my query, i only want to return the highest data quality for each user. So my query would produce:
data_src data users
srcA x user1
srcC z user2
Also assume the order srcA, srcB, srcC cannot be sorted by alphanumeric order, but I do know the right order is srcA, srcB, srcC. How can I achieve this?
CodePudding user response:
select *
from table
qualify row_number() over (partition by users order by something_sortable) = 1
is how you do it if you have something that is "sortable" but if the there is no storability, you can go for random via order by true
So if we have a known rank conversion,
with table_name(data_src,data,users) as (
select * from values
('srcA', 'x ', 'user1'),
('srcB', 'y ', 'user1'),
('srcC', 'z ', 'user2'),
(NULL, NULL, 'user2')
)
select *,
case data_src when 'srcA' then 0 when 'srcB' then 1 when 'srcC' then 2 else 256 end as rank
from table_name
--qualify row_number() over (partition by users order by rank ) = 1
you can take the lowest value per partition item/s
DATA_SRC | DATA | USERS | RANK |
---|---|---|---|
srcA | x | user1 | 0 |
srcB | y | user1 | 1 |
srcC | z | user2 | 2 |
null | user2 | 256 |
for small rank functions it can be pushed inline into the qualify
with table_name(data_src,data,users) as (
select * from values
('srcA', 'x ', 'user1'),
('srcB', 'y ', 'user1'),
('srcC', 'z ', 'user2'),
(NULL, NULL, 'user2')
)
select *
from table_name
qualify row_number() over (partition by users order by case data_src when 'srcA' then 0 when 'srcB' then 1 when 'srcC' then 2 else 256 end ) = 1
gives:
DATA_SRC | DATA | USERS |
---|---|---|
srcA | x | user1 |
srcC | z | user2 |
CodePudding user response:
I believe @Simeon is correct. You need some kind of "priority" associated with the data_src
. You can create a new table with data_src
and join to the table to use a priority
column.
If the list of data_src
is well known and finite you could create a CTE to hardcode the priority. Just be careful with NULL
s. I believe Snowflake has the ability to be "NULL Safe" when it comes to equality comparisons if you want to go that direction.
See fiddle below (It's in SQL Server but the idea should be pretty easily translatable to Snowflake). It's unfortunate that SQL Server doesn't have the qualify
statement.