Distinct keyword is very slow when used in where clause. Any replacement for this that can speed up-CodePudding

Suppose there are 2 tables - s1 and r1.

I have to return all rows of s1 only where s1.Salary is present in r1.Salary. Here Salary is a column name.

Now the below query works fine but it is extremely slow for very big datasets.

SELECT * FROM s1 where s1.Salary in  (select distinct r1.Salary from r1)

Is there any way to speed up this query or a different query to perform the same thing?

CodePudding user response：

I suggest you to use a Join. Should perform better:

SELECT s1.* 
FROM s1 
INNER JOIN (select distinct Salary from s1) q ON s1.Salary = q.Salary

CodePudding user response：

You can use a CTE to limit the data to only the r1 distinct data then join it back to the CTE I suppose

WITH R_CTE
AS
(
SELECT DISTINCT SALARY FROM R1
)
SELECT * FROM S1 S JOIN R_CTE R ON R.SALARY = S.SALARY

CodePudding user response：

Create an index on the salary column in both tables:

create index ix1 on s1(salary)
create index ix1 on r1(salary)

That should speed up the matching.

CodePudding user response：

Use EXISTS on an index, instead of materializing data using DISTINCT.

Create the index:

create index ix1 on r1 (salary);

Then, the query can take the form:

select *
from s1
where exists (select 1 from r1 where r1.salary = s1.salary)