I want to join two large tables with many columns using Presto SQL syntax in AWS Athena. My code is pretty simple:
select
*
from TableA as A
left join TableB as B
on A.key_id = B.key_id
;
After joining, the primary key column (key_id
) is repeated two times. Both tables have more than 100 columns, and the joining takes very long. How can I fix it such that the key_id
column does not repeat twice in the final result?
P.S. AWS Athena does not support except
command, unlike Google BigQuery.
CodePudding user response:
This would be a nice feature, but is not part of standard SQL. The EXCEPT
keyword is a set-based operation (i.e. filtering rows).
In Athena, as with standard SQL, you will have to specify the columns you want to include. The argument for this is that it's lower maintenance, and in fact best practice is to always explicitly state the columns you want - never leaving this to "whatever columns exist". This will help ensure your queries don't change behaviour if/when your table structure changes.
Some SQL languages have features like this. I understand Oracle has this too. But to my knowledge Athena (/ PrestoSQL / Trino) does not.