Home > database >  SQL nested select where condition is met or if row not in table
SQL nested select where condition is met or if row not in table

Time:11-20

The title is not clear as I don't know how to express the problem in one sentence.

I have a table named companies, looks like this:

Symbol (PK) Exchange (PK)
TSLA NASDAQ
AMZN NASDAQ
MCSFT NASDAQ

And then I have a table where I record the last crawl operations I made. The idea is having a script that can restart crawling everyday without recrawling companies that have been done.

Crawl_Settings

Symbol (PK/FK on companies) Exchange(PK/FK) Last crawl date Number of articles crawled
TSLA NASDAQ 19/11/2021 20
AMZN NASDAQ 12/11/2021 20

I thought about making a query that would get me all the companies that have a last crawl date that is not today.

Select * from companies where symbol,exchange in (
  select symbol,exchange from crawl_settings where date != today
)

I think that this query will work, but won't get me the companies that have never been crawled. Expected result:

Symbol (PK) Exchange (PK)
AMZN NASDAQ
MCSFT NASDAQ

explanation: AMZN has an old crawl date, MCSFT has never been crawled.

Actual result:

Symbol (PK) Exchange (PK)
AMZN NASDAQ

CodePudding user response:

You can use an anti-join to get the rows you want. For example:

select c.*
from companies c
left join crawl_settings s on s.symbol = c.symbol 
                          and s.exchange = c.exchange
                          and s.date = today
where s.symbol is null

CodePudding user response:

Impaler's solution is objectively better, because NOT IN has some finicky behavior. But just putting it out there that you were very close to being on the right track.

Just tweak where your inequality is.

Select * from companies where symbol,exchange not in (
  select symbol,exchange from crawl_settings where date = today
)
  • Related