Home > Net >  Which is more efficient, SQL join or concurrent SQL queries?
Which is more efficient, SQL join or concurrent SQL queries?

Time:11-05

Is it generally more efficient to use a JOIN query instead of manually calling two queries (concurrently) and joining them in the code ?

For eg: I have two table customer and address.

Option 1: SELECT * FROM customer c LEFT JOIN address a ON c.id=a.customer WHERE c.id=123

Option 2: SELECT * FROM customer WHERE c.id=123 then SELECT * FROM address WHERE customer=123 and combine these two in code

  • Is one of the two approaches always better than the other ?
  • What is the best way to compare these ?

CodePudding user response:

A DBMS is made to get the data you require quickly. It can easily deal with many tables in one query and still produce the results rather fast. It is unlikely that your own code will combine the tables faster than the DBMS (but it's still possible).

Your example, though, is a very simple one. As this is about one customer only, it is one row to retrieve from the customer table and only few rows from the address table. Not much to join there. You already get the data kind of ready to use.

One point even with these queries, though: When querying data from a database there is a dialogue taking place. Your app sends the SQL string to the DBMS. The DBMS parses the string, checks its validity, creates an access plan and then tells your app what columns (names and types) the query is going to return. Your app tells the DBMS "okay, send me the data" and the DBMS sends the first block of data. With two queries this dialogue (over some network possibly) takes place two times. This is called round trips, and this is something we want to reduce in order to get a good performance. Besides the parsing of the query string and building the execution plan also takes a little time in the DBMS, so one time is better than two times.

When it comes to more complex queries, the DBMS will try find the optimal execution plan (and usually be very good at that). Should it read a table sequentially or rather look up some rows via an index? Should it loop through the rows of one table and for each row find the matches in another table in order to join the tables? Or should it rather sort the two tables first and loop through both sorted lists then using a control break algorithm? Or should it create buckets and join the table on hashes? The DBMS usually has indexes available to help with looking up data, and sometimes it is a good idea to use them and sometimes not. The DBMS may even have statistics about your data, knowing the highest number of addresses per customer and the average number of addresses, too. All this is used to query your data in an optimal way.

Of course you know your data, too, usually. So you can also decide to use some hash mechanisms, binary searches, etc. when appropriate. Maybe one technique for one query and another technique for the next one. But this may require a lot of skill and be much work and may still be slower in the end.

But well, it is possible of course that you have a super fast computer on your desk and that database server is an old machine you got from your grandfather. Then it may be a good idea, just to retrieve the raw data and do the processing on your PC. As a rule of thumb, though: Let the DBMS do this for you. It will almost always be the faster approach.

  • Related