Home > OS >  Nested LEFT OUTER JOINS takes too long compared to programmatical solution
Nested LEFT OUTER JOINS takes too long compared to programmatical solution

Time:01-25

I have an SQL query generated by Entity Framework that contains a two level nested LEFT OUTER JOIN on tables TableA, TableB and TableC1 and TableC2 with the following foreign key relations as the arrows indicate.

TableA->TableB->TableC1
              ->TableC2

TableA contains 1000 rows, and all other tables contain around 100000 rows The SQL statement looks something like this:

select * from TableA A
  LEFT JOIN TableB B on A.Id = B.TableAId
  LEFT JOIN TableC1 C1 on B.Id = C1.TableBId
  LEFT JOIN TableC2 C2 on B.Id = C2.TableBId

When SQL is executed on Microsoft SQL Server, it takes around 30 seconds. However, if I do a select of each table and retrieve the rows as lists in memory and join them programmatically in C#, it takes 3 seconds or so. Can anyone give any indication of why sql server is so slow?

Thanks

CodePudding user response:

When you use joins in SQL you create something called a Cartesian Product.

For example, if I have 2 tables, Table A and Table B, Table A has 10 rows and each row has 10 Table B references:

If I load these 2 tables separately, I am loading 110 rows. 10 rows from table A, and 100 rows from table B.

If I JOIN these tables I am loading 100 rows, however, those hundred rows each represent the combine data of both tables. If Table A has 10 columns, and Table B has 20 columns, the total data read loading the tables separately would be 10x10 100x20 or 2100 columns worth of data. With a JOIN, I am loading 30x100 or 3000 columns worth of data. That's not a huge difference, but it compounds as I join more tables.

If each Table B has a Table C with an average of 5 rows and 10 columns, loaded separately that would add 5000 (500x10) or now 7600 columns worth of data. When Joined, that becomes 3000x5x10, or 150,000 columns worth of total data being loaded into memory or sifted through. You should see how this can and does snowball quite quickly if you start doing SELECT * FROM ... with Joins.

When EF goes to build a query where you are loading entity graphs (related entities) the resulting query will often use JOINs resulting in these Cartesian results that it loads, then sifts through to build the resulting object graph, condensing the results back down into the 10 A's with 10 B's and 5 C's, etc. But it still takes memory and time to chew through all of that flattened resulting data. EF Core can offer query splitting to essentially execute more like what your counter-comparison was, loading the related tables separately to piece together, greatly reducing the total amount of data being read.

Ultimately to improve performance of queries generated by EF:

  1. Use Select or Automapper's ProjectTo to select just the values from the related tables, rather than loading Entities /w Include to eager load related entities when reading "sets" of entities like search results. Load entities /w Include for single entities like when updating one.
  2. Ensure that when querying the above data, inspect the execution plans for index suggestions.
  3. If you do need to load large amounts of related data, consider use query splitting.
  • Related