I am trying to write a query in azure databricks and I am getting the following error
"IN/EXISTS predicate sub-queries can only be used in Filter/Join and a few commands"
This is the code I am using.
SELECT id,
(CASE WHEN id in (SELECT id from aTable) THEN 1 ELSE 0 END) as a,
(CASE WHEN id in (SELECT id from bTable) THEN 1 ELSE 0 END) as b,
(CASE WHEN id in (SELECT id from cTable) THEN 1 ELSE 0 END) as c
FROM table
I read that sql doesn't let you do this because the case statements are evaluated row by row, and it wants to prevent you from doing a SELECT statement for each row evaluation. If that is the case, is there an alternative or workaround to accomplish this? Thanks
CodePudding user response:
Databricks does not support subqueries using IN or EXISTS in CASE statements. As an alternative, consider outer joining each view to master table:
Query could be like then structure below:
select .....
case when a.id is not null then a
when b.id is not null then b
end as id
from Table_t t LEFT JOIN (select id from aTable ) a ON t.id=a.id LEFT JOIN(
select id from bTable) b ON t.id=b.id
..................
CodePudding user response:
I tried to reproduce similar scenario and got same error:
Regardless of whether it is contained in a CASE WHEN, the IN operator utilising a subquery only functions in filters, not projections. If you explicitly supply values in the IN clause as opposed to using a subquery, it works just great.
To work around this, I tried left join to tables and then check for a null in the case statement.
This query might work
%sql
SELECT t.Id,
(CASE WHEN at.Id is not null THEN 1 ELSE 0 END) as a,
(CASE WHEN bt.Id is not null THEN 1 ELSE 0 END) as b,
(CASE WHEN ct.Id is not null THEN 1 ELSE 0 END) as c
FROM table t
LEFT JOIN aTable at ON t.Id = at.Id
LEFT JOIN bTable bt ON t.Id = bt.Id
LEFT JOIN cTable ct ON t.Id = ct.Id
Sample data:
Output: