I have an enormous database containing people's data including (unique) name, country of location, birth date etc. I want to search for a list of people in this huge database the following way:
SELECT * FROM table1 WHERE
NAME = 'Alice' OR NAME = 'Bob' OR NAME = 'Charlie'
Given the size of this data set, obviously the more people I include in the WHERE clause, the longer the query takes. Lets say, I know for a fact that the people I look for is located in the UK. Would including the country in my query reduce the time of search? Would SQL know that trimming down the table according to country reduces the time to search for all these people I want? Or would it actually be just an additional criteria that it needs to look for, hence slowing down the performance? When there are multiple OR, AND sections, how does SQL know which order is the most efficient? My modified query would be:
SELECT * FROM table1 WHERE
COUNTRY = 'UK' AND (NAME = 'Alice' OR NAME = 'Bob' OR NAME = 'Charlie')
Would this be faster?
Edit: another thing I did not mention is that there are no indexes
CodePudding user response:
If the table has no indexes as you said, then the terms in your search are virtually irrelevant. The search will examine every row regardless. That is, a "table-scan."
Once the query examines a given row, it needs to evaluate at least some of the search terms. But the cost of fetching a row to examine is far higher than the search terms you show.
By analogy: suppose you're searching in a phone book. You decide to search the brute-force way, reading every page in the book. So now you're asking if it's more costly to search for one name or three names. The answer is that you're spending orders of magnitude more time reading every page in the book instead of searching efficiently based on the alphabetical order that it seems silly to be micro-optimizing about how many names you're searching for.
You would be far better off adding an index on the pair of columns: (COUNTRY, NAME)
in that order. Then the optimizer will immediately narrow down the search to the group of rows where COUNTRY='UK'
and within those, it will do a range search for the handful of names.
Just like if you searched a telephone book for one specific LAST_NAME
, and three different FIRST_NAME
s. Taking advantage of the sort order of the book would help you hugely.
You might like my presentation How to Design Indexes, Really, or the video.
CodePudding user response:
Including more criteria will actually speed up your query results. The more precise the data you are looking for the less data that needs to be returned. If I am looking for John. Then there will be a much smaller list of Johns if I only look in Texas.
CodePudding user response:
You can simplify
NAME = 'Alice' OR NAME = 'Bob' OR NAME = 'Charlie'
to
NAME IN ('Alice', 'Bob','Charlie')
The first query will benefit from
INDEX(name)
The second will benefit more from
INDEX(country, name)
Without a suitable index, it will read and look at every row in "this huge database". With a suitable index, it can jump to the right spot in the index, read a few rows, then look them up in the data. Much faster.
Also, be sure to have a PRIMARY KEY
.
CodePudding user response:
If you don't want to add indexes to the table, then the answer is easy:
A query that includes all conditions will be resolved with a single full table scan and will return all the rows you want.
Now, if you can add indexes you can assemble the query using UNION ALL
to achieve high performance.