Each product has a LatLng
. Over time, sometimes products can be duplicated but it is difficult to spot. The price or images may be slightly different.
I want to compare Products
by their ProductType
and their Price
in each location and put them on a map so duplicates can be found more easily.
For each product I calculate 5% of its price so I can then add/subtract this from other Product prices to find rough matches.
Products
ProductID ProductType Price Latitude Longitude
--------------------------------------------------------------------
ABC Red Widget 500 12.34 67.89
DEF Red Widget 505 12.34 67.89
MNO Red Widget 480 12.34 67.89
RST Red Widget 500 12.34 67.89
UVW Red Widget 300 12.34 67.89
JKL Blue Widget 800 76.54 32.10
XYZ Blue Widget 800 45.67 23.45
The expected result is that ABC
, DEF
, MNO
AND RST
should be returned because they are all Red Widgets
, are in the same location and are have either exactly the same price or prices within 5% of each other's price.
UVW
is outside the percentage so likely not a duplicate and should not be returned.
JKL
and XYZ
are Blue Widgets
with the same price, but not in the same location, and thus not returned.
SQL
This is what I have so far but it is returning far too many results. I think it's returning rows that are matching outside of each location. The ANDs in the JOIN look correct to me. It feels like the ORs may be allowing more flexibility in matching but the enclosing brackets look correct...
WITH cte AS
(
SELECT
p.ProductID,
p.ProductType,
p.Price,
(p.Price / 100) * 5 AS PricePercent,
ROUND(p.Latitude, 3) AS Latitude,
ROUND(p.Longitude, 3) AS Longitude
FROM
Products p
WHERE
p.Latitude IS NOT NULL AND p.Longitude IS NOT NULL
)
SELECT
DISTINCT a.ProductID,
a.Price,
a.Latitude, a.Longitude
FROM
cte a
INNER JOIN
/* ProductIDs don't match */
cte b ON a.ProductID <> b.ProductID
/* match only where location is the same*/
AND a.Latitude = b.Latitude
AND a.Longitude = b.Longitude
/* match only where Product Type is the same*/
AND a.ProductType = b.ProductType
/*match only if price is the same, or within 5% above or 5% below price*/
AND (/*same price*/
b.Price = a.Price
OR
/*b.price is within percentage over a.price*/
(b.Price > a.Price AND b.Price < (a.Price a.PricePercent))
OR
/*b.price is within percentage under a.price */
(b.Price < a.Price AND b.Price >(a.Price-a.PricePercent)))
CodePudding user response:
First, this thing :
AND (/*same price*/
b.Price = a.Price
OR
/*b.price is within percentage over a.price*/
(b.Price > a.Price AND b.Price < (a.Price a.PricePercent))
OR
/*b.price is within percentage under a.price */
(b.Price < a.Price AND b.Price >(a.Price-a.PricePercent))
)
Is (or should be) the same as this :
AND (
-- price within range /- percent
b.Price < (a.Price a.PricePercent)
AND b.Price >(a.Price-a.PricePercent)
)
And second for me it work... :
MS SQL Server 2017 Schema Setup:
Query 1:
WITH cte AS
(
SELECT
p.ProductID,
p.ProductType,
p.Price,
(p.Price / 100) * 5 AS PricePercent,
ROUND(p.Latitude, 3) AS Latitude,
ROUND(p.Longitude, 3) AS Longitude
FROM
Products p
WHERE
p.Latitude IS NOT NULL AND p.Longitude IS NOT NULL
)
SELECT
DISTINCT a.ProductID,
a.Price,
a.Latitude, a.Longitude
FROM
cte a
INNER JOIN
/* ProductIDs don't match */
cte b ON a.ProductID <> b.ProductID
/* match only where location is the same*/
AND a.Latitude = b.Latitude
AND a.Longitude = b.Longitude
/* match only where Product Type is the same*/
AND a.ProductType = b.ProductType
/*match only if price is the same, or within 5% above or 5% below price*/
AND (
-- price within range /- percent
b.Price < (a.Price a.PricePercent)
AND b.Price >(a.Price-a.PricePercent)
)
| ProductID | Price | Latitude | Longitude |
|-----------|-------|----------|-----------|
| ABC | 500 | 12 | 68 |
| DEF | 505 | 12 | 68 |
| RST | 500 | 12 | 68 |