SQL nested select where condition is met or if row not in table-CodePudding

The title is not clear as I don't know how to express the problem in one sentence.

I have a table named companies, looks like this:

Symbol (PK)	Exchange (PK)
TSLA	NASDAQ
AMZN	NASDAQ
MCSFT	NASDAQ

And then I have a table where I record the last crawl operations I made. The idea is having a script that can restart crawling everyday without recrawling companies that have been done.

Crawl_Settings

Symbol (PK/FK on companies)	Exchange(PK/FK)	Last crawl date	Number of articles crawled
TSLA	NASDAQ	19/11/2021	20
AMZN	NASDAQ	12/11/2021	20

I thought about making a query that would get me all the companies that have a last crawl date that is not today.

Select * from companies where symbol,exchange in (
  select symbol,exchange from crawl_settings where date != today
)

I think that this query will work, but won't get me the companies that have never been crawled. Expected result:

Symbol (PK)	Exchange (PK)
AMZN	NASDAQ
MCSFT	NASDAQ

explanation: AMZN has an old crawl date, MCSFT has never been crawled.

Actual result:

Symbol (PK)	Exchange (PK)
AMZN	NASDAQ

CodePudding user response：

You can use an anti-join to get the rows you want. For example:

select c.*
from companies c
left join crawl_settings s on s.symbol = c.symbol 
                          and s.exchange = c.exchange
                          and s.date = today
where s.symbol is null

CodePudding user response：

Impaler's solution is objectively better, because NOT IN has some finicky behavior. But just putting it out there that you were very close to being on the right track.

Just tweak where your inequality is.

Select * from companies where symbol,exchange not in (
  select symbol,exchange from crawl_settings where date = today
)