The title is not clear as I don't know how to express the problem in one sentence.
I have a table named companies, looks like this:
Symbol (PK) | Exchange (PK) |
---|---|
TSLA | NASDAQ |
AMZN | NASDAQ |
MCSFT | NASDAQ |
And then I have a table where I record the last crawl operations I made. The idea is having a script that can restart crawling everyday without recrawling companies that have been done.
Crawl_Settings
Symbol (PK/FK on companies) | Exchange(PK/FK) | Last crawl date | Number of articles crawled |
---|---|---|---|
TSLA | NASDAQ | 19/11/2021 | 20 |
AMZN | NASDAQ | 12/11/2021 | 20 |
I thought about making a query that would get me all the companies that have a last crawl date that is not today.
Select * from companies where symbol,exchange in (
select symbol,exchange from crawl_settings where date != today
)
I think that this query will work, but won't get me the companies that have never been crawled. Expected result:
Symbol (PK) | Exchange (PK) |
---|---|
AMZN | NASDAQ |
MCSFT | NASDAQ |
explanation: AMZN has an old crawl date, MCSFT has never been crawled.
Actual result:
Symbol (PK) | Exchange (PK) |
---|---|
AMZN | NASDAQ |
CodePudding user response:
You can use an anti-join to get the rows you want. For example:
select c.*
from companies c
left join crawl_settings s on s.symbol = c.symbol
and s.exchange = c.exchange
and s.date = today
where s.symbol is null
CodePudding user response:
Impaler's solution is objectively better, because NOT IN has some finicky behavior. But just putting it out there that you were very close to being on the right track.
Just tweak where your inequality is.
Select * from companies where symbol,exchange not in (
select symbol,exchange from crawl_settings where date = today
)