Faster way to calculate monthly inventory over huge table-CodePudding

I've got a table with hundreds of thousands of rows. Each one relating to an item listed to be sold, along with when the item was placed on the market and when it was sold, (if it has been sold yet.)

I'm looking to calculate the number of items that were active in a given month with other given criteria, (the other criteria has enough permutations that pre-calculating these numbers would not be ideal.) I'm trying to retrieve 120 months worth of data and average it on a webpage so ideally the entire query would be as fast as possible. I've got a query that works... but it takes 5 seconds to run for only 12 months of data and I need it to be at least 10x faster than that so I can display a 10 year average in a decent time.

Here is an example of what I'm using now:

SELECT Avg(inv)
FROM   (
    SELECT
           CASE
                  WHEN Count(*) =0 THEN NULL
                  ELSE Count(*)
           END AS inv
    FROM   listings
    WHERE  originalentrytimestamp <= @DateOfMonthEnd
    AND    @DateOfMonthEnd < @currentMonthStart
    AND    (
                  offmarketdate > @DateOfMonthEnd
           OR     offmarketdate IS NULL)
    AND    city='New York'
    AND    listprice >= 0
    AND    listprice <= 999999999
    AND    category1='RESI'
    AND    category2 IN('D','A','S','R','O')
    UNION ALL
    SELECT
           CASE
                  WHEN Count(*) =0 THEN NULL
                  ELSE Count(*)
           END AS inv
    FROM   listings
    WHERE  originalentrytimestamp <= Dateadd(month, 1,@DateOfMonthEnd)
    AND    Dateadd(month, 1,@DateOfMonthEnd) < @currentMonthStart
    AND    (
                  offmarketdate > Dateadd(month, 1,@DateOfMonthEnd)
           OR     offmarketdate IS NULL)
    AND    city='New York'
    AND    listprice >= 0
    AND    listprice <= 999999999
    AND    category1='RESI'
    AND    category2 IN('D','A','S','R','O')
    UNION ALL
    SELECT
           CASE
                  WHEN Count(*) =0 THEN NULL
                  ELSE Count(*)
           END AS inv
    FROM   listings
    WHERE  originalentrytimestamp <= Dateadd(month, 2,@DateOfMonthEnd)
    AND    Dateadd(month, 2,@DateOfMonthEnd) < @currentMonthStart
    AND    (
                  offmarketdate > Dateadd(month, 2,@DateOfMonthEnd)
           OR     offmarketdate IS NULL)
    AND    city='New York'
    AND    listprice >= 0
    AND    listprice <= 999999999
    AND    category1='RESI'
    AND    category2 IN('D','A','S','R','O')
     )

I've only included 3 months in this code for brevity, but this quickly gets unwieldly if I want to run 120 months. It's also slow, but I'm not sure how to re-word it to use a window function or group by clause as a single listing might be counted in both January and February if it took a while to sell.

The case statement is to allow the code to be used to calculate year-to-date averages as well where the code might include a future month with no data and that should be excluded from the average, (but it wouldn't be necessary if the entire thing can be re-coded in a format that accepts a start and end month rather than hardcoding 120 queries.)

edit from comments discussion:

execution plan: https://brentozar.com/pastetheplan/?id=rJtyNali5
table structure and indexes: https://pastebin.com/UxRTSDAx

This table is a copy of the live data made daily and not written to mid-day, any indexes can be added or removed freely to speed up the query, (but the example where clauses for city, list price, etc are only examples, we have the indexes we have so far because we might search on different clauses there.)

CodePudding user response：

Try a covering index on (city, category1, category2, originalentrytimestamp) including the offmarketdate and listprice columns. DDL like this might create the index you need.

CREATE NONCLUSTERED INDEX [CityCategoriesTimestamp] ON [dbo].[listings]
(
    [city] ASC,
    [category1] ASC,
    [category2] ASC,
    [originalentrytimestamp] ASC
)
INCLUDE (
    [offmarketdate] ASC,
    [listprice] ASC
);

This works because the index can be scanned sequentially from the first eligible row to the last. That's faster than scanning the whole table.

And, consider reworking your query to use EOMONTH(). Get rid of the big UNION ALL cascade, instead doing GROUP BY(originalentrytimestamp). I'd suggest a rewrite but I'm not sure I understand your business logic well enough to do that.

CodePudding user response：

I ended up getting around this by doing a pre-calculation of these inventory levels for a bunch of possible criteria. It's not ideal as it limits what other criteria I can apply to this search and the pre-calc table is taking a significant amount of space, but it will work for the specific business case I'm working with.