I have a table called "Domains" with field "Name" (unique, always lowercase) which contains a list of domains and subdomains on my server like:
- blah.example.com
- www.example.com
- www.blah.example.com
- example.com
- example.nl
- example.org
Looking at this list, names 1, 2 and 3 are subdomains of item 4. And I'm looking to just find all domains in this table without these subdomains. Or, to be more precise, any name that does not have part of it in the name from another record. Thus only item 4, 5 and 6.
If record 4 was missing then this query would also have item 1 and 2 as result, but not item 3. After all, item 3 has item 1 as part of it.
Just trying to find the query that can provide me this result... Something with select d.name from domains where d.name not in...
Well, there my mind goes blank.
Why?
This list of domains is generated by my web server which registers every new domain that gets requested on it. I'm working on a reporting page where I would display the top domain names to see if there are any weird domains in it. For some reason, I sometimes see unknown domain names in these requests and this might give some additional insight in it all.
I am going to change my code so it will include references to parent domains in the same table in the future but for now I'll have to deal with this and a simple SQL solution.
CodePudding user response:
Use a self-join that matches on suffixes using LIKE
SELECT d1.name
FROM domains AS d1
LEFT JOIN domains AS d2 ON d1.name LIKE CONCAT('%.', d2.name)
WHERE d2.name IS NULL