Home > OS >  How to pull rows from a SQL table until quotas for multiple columns are met?
How to pull rows from a SQL table until quotas for multiple columns are met?

Time:12-15

I've been able to find a few examples of questions similar to this one, but most only involve a single column being checked.

SQL Select until Quantity Met

Select rows until condition met

I have a large table representing facilities, with columns for each type of resource available and the number of those specific resources available per facility. I want this stored procedure to be able to take integer values in as multiple parameters (representing each of these columns) and a Lat/Lon. Then it should iterate over the table sorted by distance, and return all rows (facilities) until the required quantity of available resources (specified by the parameters) are met.

Data source example:

Id Lat Long Resource1 Resource2 ...
1 50.123 4.23 5 12 ...
2 61.234 5.34 0 9 ...
3 50.634 4.67 21 18 ...

Result Wanted:

@latQuery = 50.634

@LongQuery = 4.67

@res1Query = 10

@res2Query = 20

Id Lat Long Resource1 Resource2 ...
3 50.634 4.67 21 18 ...
1 50.123 4.23 5 12 ...

Result includes all rows that meet the queries individually. Result is also sorted by distance to the requested lat/lon


I'm able to sort the results by distance, and sum the total running values as suggested in other threads, but I'm having some trouble with the logic comparing the running values with the quota provided in the params.

First I have some CTEs to get most recent edits, order by distance and then sum the running totals

WITH cte1 AS (SELECT
  @origin.STDistance(geography::Point(Facility.Lat, Facility.Long, 4326)) AS distance,
  Facility.Resource1 as res1,
  Facility.Resource2 as res2
  -- ...etc
  FROM Facility
),
cte2 AS (SELECT 
  distance, 
  res1, 
  SUM(res1) OVER (ORDER BY distance) AS totRes1,
  res2, 
  SUM(res1) OVER (ORDER BY distance) AS totRes2
  -- ...etc, there's 15-20 columns here
  FROM cte1
)

Next, with the results of that CTE, I need to pull rows until all quotas are met. Having the issues here, where it works for one row but my logic with all the ANDs isn't exactly right.

SELECT * FROM cte2 WHERE (
  (totRes1 <= @res1Query OR (totRes1 > @res1Query AND totRes1- res1 <= @totRes1)) AND
  (totRes2 <= @res2Query OR (totRes2 > @res2Query AND totRes2- res2 <= @totRes2)) AND
  -- ... I also feel like this method of pulling the next row once it's over may be convoluted as well?
)

As-is right now, it's mostly returning nothing, and I'm guessing it's because it's too strict? Essentially, I want to be able to let the total values go past the required values until they are all past the required values, and then return that list.

Has anyone come across a better method of searching using separate quotas for multiple columns?


See my update in the answers/comments

CodePudding user response:

Was able to figure out the problem on my own here. The primary issue I was running into was that I was comparing 25 different columns' running totals versus the 25 stored proc parameters (quotas of resources required by the search).

Changing the lines such as these

(totRes1 <= @res1Query OR (totRes1 > @res1Query AND totRes1- res1 <= @totRes1)) AND --...

to

(totRes1 <= @res1Query OR (totRes1 > @res1Query AND totRes1- res1 <= @totRes1) OR @res1Query = 0) AND --...

(adding in the OR @res1Query = 0)solved my issue.

In other words, the search is often only for one or two columns (types of resources) - leaving others as zero. The way my logic was set up caused it to skip over lots of rows because it was instantly marking them as having met the quota (value less than or equal to the quota). like @A Neon Tetra suggested, was pretty close to it already.


Update:

First attempt didn't exactly fix my own issue. Posting the stripped down version of my code that is now working for me.

DECLARE @Lat AS DECIMAL(12,6)
DECLARE @Lon AS DECIMAL(12,6)
DECLARE @res1Query AS INT
DECLARE @res2Query AS INT
-- repeat for Resource 3 through 25, etc...

DECLARE @origin geography = geography::Point(@Lat, @Lon, 4326);

-- CTE to be able to expose distance 
cte AS (SELECT TOP(99999)  -- --> this is hacky, it won't let me order by distance unless I'm selecting TOP(x) or some other fn?
    dbo.Facility.FacilityGUID,
    dbo.Facility.Lat,
    dbo.Facility.Lon,
    @origin.STDistance(geography::Point(dbo.Facility.Lat, dbo.Facility.Lon, 4326)) 
    AS distance,
    dbo.Facility.Resource1 AS res1,
    dbo.Facility.Resource2 AS res2,
    -- repeat for Resource 3 through 25, etc...
FROM dbo.Facility
ORDER BY distance), 

-- third CTE - has access to distance so we can keep track of a running total ordered by distance
---> have to separate into two since you can't reference the same alias (distance) again within the same SELECT
fullCTE AS (SELECT 
    FacilityID,
    Lat, 
    Long, 
    distance, 
    res1, 
  SUM(res1) OVER (ORDER BY distance)AS totRes1,
    res2, 
  SUM(res2) OVER (ORDER BY distance)AS totRes2,
  -- repeat for Resource 3 through 25, etc...
FROM cte)

SELECT  * -- Customize what you're pulling here for your output as needed
FROM dbo.Facility INNER JOIN fullCTE ON (fullCTE.FacilityID = dbo.Facility.FacilityID)
WHERE EXISTS
    (SELECT 
        FacilityID
    FROM fullCTE WHERE (
        FacilityID = dbo.Facility.FacilityID AND
        -- Keep pulling rows until all conditions are met, as opposed to pulling rows while they're under the quota
        NOT (
          ((totRes1 - res1 >= @res1Query AND @res1Query <> 0) OR (@res1Query = 0)) AND
          ((totRes2 - res2 >= @res2Query AND @res2Query <> 0) OR (@res2Query = 0)) AND
          -- repeat for Resource 3 through 25, etc...
        )
  )
)

CodePudding user response:

I think you are massively over-complicating this. This does not need any joins, just some running sum calculations, and the right OR logic.

The key to solving this is that you need all rows, where the running sum up to the previous row is less than the requirement for all requirements. This means that you include all rows where the requirement has not been met, and the first row for which the requirement has been met or exceeded.

To do this you can subtract the current row's value from the running sum.

You could utilize a ROWS specification of ROWS BETWEEN UNBOUNDED PRECEDING AND 1 PRECEDING. But then you need to deal with NULL on the first row.

In any event, even a regular running sum should always use ROWS UNBOUNDED PRECEDING, because the default is RANGE UNBOUNDED PRECEDING, which is subtly different and can cause incorrect results, as well as being slower.

You can also factor out the distance calculation into a CROSS APPLY (VALUES, avoiding the need for lots of CTEs or derived tables. You now only need one level of derivation.

DECLARE @origin geography = geography::Point(@latQuery, @LongQuery, 4326);

SELECT
  f.Id,
  f.Lat,
  f.Long,
  f.Resource1,
  f.Resource2
FROM (
    SELECT f.*,
        SumRes1 = SUM(f.Resource1) OVER (ORDER BY v1.Distance ROWS UNBOUNDED PRECEDING) - f.Resource1,
        SumRes2 = SUM(f.Resource2) OVER (ORDER BY v1.Distance ROWS UNBOUNDED PRECEDING) - f.Resource2
    FROM Facility f
    CROSS APPLY (VALUES(
        @origin.STDistance(geography::Point(f.Lat, f.Long, 4326))
    )) v1(Distance)
) f
WHERE (
      f.SumRes1 < @res1Query
   OR f.SumRes2 < @res2Query
);

db<>fiddle

  • Related