Say we have two tables performing a left join:
Table 1
Joint Key || Attribute 1 || Attribute 2 || Attribute 3
A 1 11 21
B 2 12 22
C 3 13 23
Table 2
Joint Key || Attribute 4 || Attribute 5
A 31 41
A 32 42
C 33 43
by performing a table 1 left join table 2 on "Joint Key"
it will return two records having
Joint Key = 'A'
Joint Key || Attribute 1 || Attribute 2 || Attribute 3 || Attribute 4 || Attribute 5
A 1 11 21 31 41
A 1 11 21 32 42
What's the best practice of defining the return police, specifically in snowflake, that can return me the same row count as table 1.
Taking the above example, I want the the record has the MAX(Attribute 4)
. Two initial ideas come to my mind
Option 1: use "GROUP BY" clause -- need list columns explicitly, cumbersome when dealing with table has many columns.
Option 2: something like
select * from (
select
Tabel1.*
max(Table2.Attribute_4) as mx_Attribute_4,
Table2.Attribute_5
from Table1
left join Table2
on Joint_Key
) as temp
where temp.Attribute_4 = temp.mx_Attribute_4
it's quite complicated and time-consuming too.
Any other suggestions?
CodePudding user response:
You seem to have some confused ideas about how joins work. If you have Table1 left join Table2 then it will return all the records from Table1 with any data from matching records in Table2 - so in your case you would normally get the 3 records from Table1.
However, in your case you have 2 records in table2 that matches 1 record in table 1 so this will duplicate your results and you will get 4 records: 2 with key A and then 1 with B and 1 with C.
Anyway, given the example data you’ve provided, please update your question with the result you want to achieve so that someone can help you
CodePudding user response:
you could use QUALIFY
Something like:
select
t1.Joint_key, t1.Attribute_1, t1.Attribute_2, t1.Attribute_3, t2.Attribute_4, t2.Attribute_5
from Table1 t1
left join Table2 t2
on t1.Joint_key = t2.Joint_key
qualify row_number() over(partition by Joint_Key order by Attribute_4 desc) = 1
This is certainly more clean, and should be more efficient than a group by. It does still require the query to sort records by Attribute_4, but I don't see a way of avoiding that unless you are ok with using any of the sets of values instead of the one with MAX(Attribute_4). In that case you could be more efficient by using order by 1
in the row_number()
window function.