SQL statement with conditional where clause-CodePudding

I have a table that i would like to filter on a particular data_src column where it can have the following possible values:

data_src data   users
srcA     x      user1
srcB     y      user1
srcC     z      user2
NULL     NULL   user2

Let's say these data_src have different data qualities: srcA > srcB > srcC > NULL

In my query, i only want to return the highest data quality for each user. So my query would produce:

data_src data   users
srcA     x      user1
srcC     z      user2

Also assume the order srcA, srcB, srcC cannot be sorted by alphanumeric order, but I do know the right order is srcA, srcB, srcC. How can I achieve this?

CodePudding user response：

select *
from table
qualify row_number() over (partition by users order by something_sortable) = 1

is how you do it if you have something that is "sortable" but if the there is no storability, you can go for random via order by true

So if we have a known rank conversion,

with table_name(data_src,data,users) as (
    select * from values
    ('srcA', 'x   ', 'user1'),
    ('srcB', 'y   ', 'user1'),
    ('srcC', 'z   ', 'user2'),
    (NULL, NULL, 'user2')
)
select *,
    case data_src when 'srcA' then 0 when 'srcB' then 1 when 'srcC' then 2 else 256 end as rank
from table_name
--qualify row_number() over (partition by users order by rank ) = 1

you can take the lowest value per partition item/s

DATA_SRC	DATA	USERS	RANK
srcA	x	user1	0
srcB	y	user1	1
srcC	z	user2	2
null	user2	256

for small rank functions it can be pushed inline into the qualify

with table_name(data_src,data,users) as (
    select * from values
    ('srcA', 'x   ', 'user1'),
    ('srcB', 'y   ', 'user1'),
    ('srcC', 'z   ', 'user2'),
    (NULL, NULL, 'user2')
)
select *
from table_name
qualify row_number() over (partition by users order by case data_src when 'srcA' then 0 when 'srcB' then 1 when 'srcC' then 2 else 256 end ) = 1

gives:

DATA_SRC	DATA	USERS
srcA	x	user1
srcC	z	user2

CodePudding user response：

I believe @Simeon is correct. You need some kind of "priority" associated with the data_src. You can create a new table with data_src and join to the table to use a priority column.

If the list of data_src is well known and finite you could create a CTE to hardcode the priority. Just be careful with NULLs. I believe Snowflake has the ability to be "NULL Safe" when it comes to equality comparisons if you want to go that direction.

See fiddle below (It's in SQL Server but the idea should be pretty easily translatable to Snowflake). It's unfortunate that SQL Server doesn't have the qualify statement.

https://dbfiddle.uk/3SSwHoYp