Home > OS >  SQL Server query improvement suggestion, aggregation of large amount of data in a query
SQL Server query improvement suggestion, aggregation of large amount of data in a query

Time:09-17

My BI developer wrote a query that took 14 hours to run and I'm trying to help him out. On a high level, it's a query that explores financial transaction of the past 15 years and break them down for each quarter.

I'm sharing the answers I already gave him here but I would like to know if you have any suggestion where we can explore and research further to improve the performance, answer such as: "perhaps you may want to look at snapshot.."

His query consists of:

  • Includes the use of multiple views, meaning select from one view to produce another view etc..
  • Some views join three tables, each has around 100 - 200 million rows.
  • Certain view use sub select query.

Here are my recommendations so far:

  • Do not use nested views to produce the query, instead of using views create new tables for each of them because the data is not dynamic (financial transaction data) and won't change. Nested views from my experience aren't good for performance.
  • Do not use sub query, use JOIN whenever possible.
  • I make sure he creates non cluster index wherever appropriate.
  • Do not use TEMPT table when there is this much data.
  • Try and use WITH(NO LOCK) on all tables used in JOIN
  • Find an common query and convert it into a stored procedure
  • When joining those three large tables (100 - 200 million rows), try to limit the data amount at the JOIN instead of using WHERE. Ex, instead of select * from tableA JOIN tableB WHERE... , USE SELECT * FROM TableA JOIN tableB ON .... AND tableA.date BETWEEN range. This will give less data when joining with other table later in the query.

The problem is the data he has to work with are too huge, I wonder the query performance can only do so much because at the end of the day, you still have to process all those data in your query. Perhaps the next step is to think of how one can prepare these data and store them in smaller table first such as CostQ1_2010, CostQ2_2020 ect... and then write your query based on all those tables.

CodePudding user response:

You have given us very little information to go on. Tolstoy wrote, "All happy families are alike; each unhappy family is unhappy in its own way.” That's also true of SQL queries, especially big BI queries.

I'll risk some general answers.

With tables of the size you mention, your query surely contains date-range WHERE filters like transaction_date >= something AND transaction_date < anotherthing. In general, a useful report covers a year out of a decade of transactions. So make sure you have the right indexes to do index range scans where possible. SSMS, if you choose the Show Actual Execution Plan feature, sometimes suggests indexes.

Learn to read execution plans.

Read about covering indexes. They can sometimes make a big difference.

Use the statement SET TRANSACTION ISOLATION LEVEL READ UNCOMMITTED before starting this kind of long-running historical BI query. You'll get less interference between the BI query and other activity on the database.

It may make sense to preload some denormalized tables from the views used in the BI query.

  • Related