I've got a table with hundreds of thousands of rows. Each one relating to an item listed to be sold, along with when the item was placed on the market and when it was sold, (if it has been sold yet.)
I'm looking to calculate the number of items that were active in a given month with other given criteria, (the other criteria has enough permutations that pre-calculating these numbers would not be ideal.) I'm trying to retrieve 120 months worth of data and average it on a webpage so ideally the entire query would be as fast as possible. I've got a query that works... but it takes 5 seconds to run for only 12 months of data and I need it to be at least 10x faster than that so I can display a 10 year average in a decent time.
Here is an example of what I'm using now:
SELECT Avg(inv)
FROM (
SELECT
CASE
WHEN Count(*) =0 THEN NULL
ELSE Count(*)
END AS inv
FROM listings
WHERE originalentrytimestamp <= @DateOfMonthEnd
AND @DateOfMonthEnd < @currentMonthStart
AND (
offmarketdate > @DateOfMonthEnd
OR offmarketdate IS NULL)
AND city='New York'
AND listprice >= 0
AND listprice <= 999999999
AND category1='RESI'
AND category2 IN('D','A','S','R','O')
UNION ALL
SELECT
CASE
WHEN Count(*) =0 THEN NULL
ELSE Count(*)
END AS inv
FROM listings
WHERE originalentrytimestamp <= Dateadd(month, 1,@DateOfMonthEnd)
AND Dateadd(month, 1,@DateOfMonthEnd) < @currentMonthStart
AND (
offmarketdate > Dateadd(month, 1,@DateOfMonthEnd)
OR offmarketdate IS NULL)
AND city='New York'
AND listprice >= 0
AND listprice <= 999999999
AND category1='RESI'
AND category2 IN('D','A','S','R','O')
UNION ALL
SELECT
CASE
WHEN Count(*) =0 THEN NULL
ELSE Count(*)
END AS inv
FROM listings
WHERE originalentrytimestamp <= Dateadd(month, 2,@DateOfMonthEnd)
AND Dateadd(month, 2,@DateOfMonthEnd) < @currentMonthStart
AND (
offmarketdate > Dateadd(month, 2,@DateOfMonthEnd)
OR offmarketdate IS NULL)
AND city='New York'
AND listprice >= 0
AND listprice <= 999999999
AND category1='RESI'
AND category2 IN('D','A','S','R','O')
)
I've only included 3 months in this code for brevity, but this quickly gets unwieldly if I want to run 120 months. It's also slow, but I'm not sure how to re-word it to use a window function or group by clause as a single listing might be counted in both January and February if it took a while to sell.
The case statement is to allow the code to be used to calculate year-to-date averages as well where the code might include a future month with no data and that should be excluded from the average, (but it wouldn't be necessary if the entire thing can be re-coded in a format that accepts a start and end month rather than hardcoding 120 queries.)
edit from comments discussion:
- execution plan: https://brentozar.com/pastetheplan/?id=rJtyNali5
- table structure and indexes: https://pastebin.com/UxRTSDAx
This table is a copy of the live data made daily and not written to mid-day, any indexes can be added or removed freely to speed up the query, (but the example where clauses for city, list price, etc are only examples, we have the indexes we have so far because we might search on different clauses there.)
CodePudding user response:
Try a covering index on (city, category1, category2, originalentrytimestamp)
including the offmarketdate
and listprice
columns. DDL like this might create the index you need.
CREATE NONCLUSTERED INDEX [CityCategoriesTimestamp] ON [dbo].[listings]
(
[city] ASC,
[category1] ASC,
[category2] ASC,
[originalentrytimestamp] ASC
)
INCLUDE (
[offmarketdate] ASC,
[listprice] ASC
);
This works because the index can be scanned sequentially from the first eligible row to the last. That's faster than scanning the whole table.
And, consider reworking your query to use EOMONTH(). Get rid of the big UNION ALL
cascade, instead doing GROUP BY(originalentrytimestamp)
. I'd suggest a rewrite but I'm not sure I understand your business logic well enough to do that.
CodePudding user response:
I ended up getting around this by doing a pre-calculation of these inventory levels for a bunch of possible criteria. It's not ideal as it limits what other criteria I can apply to this search and the pre-calc table is taking a significant amount of space, but it will work for the specific business case I'm working with.