I have two tables
showroom
============================================
model_id car_name is_available
--------------------------------------------
0 rav4 false
1 tacoma false
2 corolla false
3 tundra false
and
warehouse
========================
model_id car_name
------------------------
0 rav4
1 tacoma
The primary key is model_id
. I want to update the column showroom.is_available
to true if the model_id
is present in warehouse
. So the output according to the above example would be
showroom
============================================
model_id car_name is_available
--------------------------------------------
0 rav4 true
1 tacoma true
2 corolla false
3 tundra false
I know I can do
UPDATE showroom
SET is_available = true
WHERE model_id IN (
SELECT model_id
FROM warehouse
);
But both tables have over a billion rows, and even though I am using Spark SQL the queries are very slow
CodePudding user response:
SELECT
a.model_id
,a.car_name
,CASE WHEN b.model_id IS NOT NULL THEN TRUE ELSE FALSE END is_available
FROM
showroom a
LEFT OUTER JOIN warehouse b ON (a.model_id = b.model_id)
CodePudding user response:
On big tables a join is faster than the IN clause
And see if you can#t reduce the number
For showroom you can use a combined index on (model_id,is_available)
When you are able also reduce the number of possible candidates
UPDATE showroom s
INNER JOIN warehouse w ON s.model_id = w.model_id
SET s.is_available = true
WHERE s.is_available = FALSE