I'm trying to run a query in Snowflake that uses the rank() function along with qualify to try and take the best match between two columns (using editdistance) and I'm getting the following error:
Invalid data type [NUMBER(18,0)] for predicate [RANK() OVER (PARTITION BY key1 ORDER BY editdistance(name1, name2) ASC NULLS LAST)]
Query looks like this:
select
key1,
key2
from
table
qualify
rank() over (
partition by key1 order by editdistance(name1, name2) asc
)
I'm really not clear on what number the error is referring to. name1/name2 are both varchar, key1 is a varchar. Hoping someone might have a pointer for me.
Thanks, Jeff
CodePudding user response:
The QUALIFY works similar to WHERE,but it is evaluated at different point during query execution. QUALIFY requires a predicate, which should evaluate to boolean.
QUALIFY <predicate>
The expression:
rank() over (partition by key1 order by editdistance(name1, name2))
returns NUMBER.
Comparing the result to integer fixes the error:
rank() over (partition by key1 order by editdistance(name1, name2)) = 1
It is worth noting that there may be tie in the data, i.e, same distance per key for more than one row, for that situation it is safer to use DENSE_RANK:
dense_rank() over (partition by key1 order by editdistance(name1, name2)) = 1
CodePudding user response:
Dean identified the problem in his comment above so just moving it to an answer to close the loop on this:
You need to provide a value for the QUALIFY function so it should be something like: qualify rank() over ( partition by key1 order by editdistance(name1, name2) asc ) = 1