I am trying to take a column of original prices and enter a discount % and return the closest match to a predetermined set of values. These allowable values are found in another table that is just one column of prices. I am curious to hear how ties would be handled. Please note that this is for a long list of items, so this would have to apply to an entire column. The specific syntax needed is Google Big Query.
I envision this functioning similarly to excel's VLOOKUP approximate = 1. In practice, I will apply the same solution to multiple price points in the results table (ex. origPrice, 25%off, 50%off, and 75%off etc. ), but I figured that I could copy-paste the solution multiple times.
The below example shows a 50% price reduction.
allowableDiscounts
discountPrice |
---|
$51.00 |
$48.50 |
$40.00 |
productInfo
Item | OrigPrice |
---|---|
Apple | $100.00 |
Banana | $ 98.00 |
Desired Output
Item | OrigPrice | exact50off | closestMatch |
---|---|---|---|
Apple | $100.00 | $50.00 | $51.00 |
Banana | $ 98.00 | $44.00 | $40.00 |
I have researched solutions here and elsewhere. Most of what I found suggested sorting the allowableDiscounts table by the absolute value of the difference between exact50off and discountPrice. That worked great for one instance, but I could not figure out how to apply that to an entire list of prices.
I have workarounds both in SQL and excel that can accomplish the same task manually, but I am looking for something to match the above function so that way if the allowableDiscounts table changes, the calculations will reflect that without recoding.
SELECT
p.Item,
p.OrigPrice,
p.OrigPrice * 0.5 AS exact50off
--new code from allowableDiscounts.discountPrice
FROM
productInfo AS p
WHERE
--filters applied as needed
CodePudding user response:
You may work it out with a CROSS JOIN
, then compute the smallest difference and filter out the other generated records (with higher differences).
Smallest difference here is retrieved by assigning a rank to all differences in each partition <Item, OrigPrice> (with ROW_NUMBER
), then all values ranked higher than 1 are discarded.
WITH cte AS (
SELECT *,
OrigPrice*0.5 AS exact50off,
ROW_NUMBER() OVER(PARTITION BY Item, OrigPrice ORDER BY ABS(discountPrice - OrigPrice*0.5)) AS rn
FROM productInfo
CROSS JOIN allowableDiscounts
)
SELECT Item,
OrigPrice,
exact50off,
discountPrice
FROM cte
WHERE rn = 1
CodePudding user response:
In case the tables are large, as you stated, a cross join is not possible and a window function is the only solution.
First we generate a function nearest
, which return the element (x or y) closest to a target value.
Then we define both tables, discountPrice and productInfo. Next, we union these tables as helper
. The first column
tmp
holds the value 1
, if the data is from the main table productInfo
and we calculate the column exact50off
. For the table discountPrice the tmp
column in set to 0
and the exact50off
column is filled with the entries discountPrice
. We add the table discountPrice again, but for column exact75off.
We query the helper
table and use:
last_value(if(tmp=0,exact50off,null) ignore nulls) over (order by exact50off),
tmp=0
: Keep only entries from the table discountPricelast_value
get nearest lowest value from table discountPrice
We run the same again, but with desc
to obtain the nearest highest value.
The function nearest
yields the nearest values of both.
Analog this is done for exact75off
create temp function nearest(target any type,x any type, y any type) as (if(abs(target-x)>abs(target-y),y,x) );
with allowableDiscounts as (select * from unnest([51,48.5,40,23,20]) as discountPrice ),
productInfo as (select "Apple" as item, 100 as OrigPrice union all select "Banana",98 union all select "Banana cheap",88),
helper as (
select 1 as tmp, # this column holds the info from which table the rows come forme
item,OrigPrice, # all colummns of the table productInfo (2)
OrigPrice/2 as exact50off, # calc 50%
OrigPrice*0.25 as exact75off, # calc 75%
from productInfo
union all # table for 50%
select 0 as tmp,
null,null, # (2) null entries, because the table productInfo has two columns (2)
discountPrice as exact50off, #possible values for 50% off
null # other calc (75%)
from allowableDiscounts
union all # table for 75%
select 0 as tmp,
null,null, # (2) null entries, because the table productInfo has two columns (2)
null, # other calc (50%)
discountPrice, #possible values for 75% off
from allowableDiscounts
)
select *,
nearest(exact50off,
last_value(if(tmp=0,exact50off,null) ignore nulls) over (order by exact50off),
last_value(if(tmp=0,exact50off,null) ignore nulls) over (order by exact50off desc)
) as closestMatch50off,
nearest(exact75off,
last_value(if(tmp=0,exact75off,null) ignore nulls) over (order by exact75off),
last_value(if(tmp=0,exact75off,null) ignore nulls) over (order by exact75off desc)
) as closestMatch75off,
from helper
qualify tmp=1
order by exact50off
CodePudding user response:
Use the ABS(X) function to compute the absolute values between the columns in the tables to make a match as an exact match or a difference in values between 1 and 4 for the various discount values as below, use a LEFT JOIN to get allow values in your leading table productInfo and either matching values or NULL from the allowableDiscounts table.
SELECT
p.Item,
p.OrigPrice,
p.OrigPrice * 0.5 AS exact50off,
p.OrigPrice * 0.25 AS exact25off,
p.OrigPrice * 0.75 AS exact75off,
q.discountPrice AS closestMatch
FROM
productInfo AS p
JOIN allowableDiscounts q on ABS(p.OrigPrice * 0.50 - q.discountPrice) = 0
OR ABS(p.OrigPrice * 0.50 - q.discountPrice) BETWEEN 0.01 AND 4.0
OR ABS(p.OrigPrice * 0.25 - q.discountPrice) = 0
OR ABS(p.OrigPrice * 0.75 - q.discountPrice) = 0
OR ABS(p.OrigPrice * 0.25 - q.discountPrice) BETWEEN 0.01 AND 4.0
OR ABS(p.OrigPrice * 0.75 - q.discountPrice) BETWEEN 0.01 AND 4.0;