Home > OS >  How do I get the closest match vlookup for entire column in Google Big Query SQL?
How do I get the closest match vlookup for entire column in Google Big Query SQL?

Time:12-30

I am trying to take a column of original prices and enter a discount % and return the closest match to a predetermined set of values. These allowable values are found in another table that is just one column of prices. I am curious to hear how ties would be handled. Please note that this is for a long list of items, so this would have to apply to an entire column. The specific syntax needed is Google Big Query.

I envision this functioning similarly to excel's VLOOKUP approximate = 1. In practice, I will apply the same solution to multiple price points in the results table (ex. origPrice, 25%off, 50%off, and 75%off etc. ), but I figured that I could copy-paste the solution multiple times.

The below example shows a 50% price reduction.

allowableDiscounts

discountPrice
$51.00
$48.50
$40.00

productInfo

Item OrigPrice
Apple $100.00
Banana $ 98.00

Desired Output

Item OrigPrice exact50off closestMatch
Apple $100.00 $50.00 $51.00
Banana $ 98.00 $44.00 $40.00

I have researched solutions here and elsewhere. Most of what I found suggested sorting the allowableDiscounts table by the absolute value of the difference between exact50off and discountPrice. That worked great for one instance, but I could not figure out how to apply that to an entire list of prices.

I have workarounds both in SQL and excel that can accomplish the same task manually, but I am looking for something to match the above function so that way if the allowableDiscounts table changes, the calculations will reflect that without recoding.

SELECT
   p.Item,
   p.OrigPrice,
   p.OrigPrice * 0.5 AS exact50off
   --new code from allowableDiscounts.discountPrice
FROM
   productInfo AS p
WHERE
   --filters applied as needed

CodePudding user response:

You may work it out with a CROSS JOIN, then compute the smallest difference and filter out the other generated records (with higher differences).

Smallest difference here is retrieved by assigning a rank to all differences in each partition <Item, OrigPrice> (with ROW_NUMBER), then all values ranked higher than 1 are discarded.

WITH cte AS (
    SELECT *,
           OrigPrice*0.5 AS exact50off,
           ROW_NUMBER() OVER(PARTITION BY Item, OrigPrice ORDER BY ABS(discountPrice - OrigPrice*0.5)) AS rn
    FROM productInfo
    CROSS JOIN allowableDiscounts
)
SELECT Item, 
       OrigPrice, 
       exact50off,
       discountPrice
FROM cte
WHERE rn = 1

CodePudding user response:

In case the tables are large, as you stated, a cross join is not possible and a window function is the only solution.

First we generate a function nearest, which return the element (x or y) closest to a target value.

Then we define both tables, discountPrice and productInfo. Next, we union these tables as helper. The first column tmp holds the value 1, if the data is from the main table productInfo and we calculate the column exact50off. For the table discountPrice the tmp column in set to 0 and the exact50off column is filled with the entries discountPrice. We add the table discountPrice again, but for column exact75off.

We query the helper table and use:

last_value(if(tmp=0,exact50off,null) ignore nulls) over (order by exact50off),
  • tmp=0 : Keep only entries from the table discountPrice
  • last_value get nearest lowest value from table discountPrice

We run the same again, but with desc to obtain the nearest highest value.

The function nearest yields the nearest values of both.

Analog this is done for exact75off

create temp function nearest(target any type,x any type, y any type) as (if(abs(target-x)>abs(target-y),y,x) );

with allowableDiscounts as (select * from  unnest([51,48.5,40,23,20]) as discountPrice ),
productInfo as (select "Apple" as item, 100 as OrigPrice union all select "Banana",98 union all select "Banana cheap",88),

helper as (
  select 1 as tmp, # this column holds the info from which table the rows come forme
  item,OrigPrice, # all colummns of the table productInfo (2)
  OrigPrice/2 as exact50off, # calc 50%
  OrigPrice*0.25 as exact75off, # calc 75%
   from  productInfo
  
  union all # table for 50%
  select 0 as tmp, 
  null,null, # (2) null entries, because the table productInfo has two columns (2)
  discountPrice as exact50off, #possible values for 50% off
  null # other calc (75%)
  from  allowableDiscounts 

  union all # table for 75%
  select 0 as tmp, 
  null,null, # (2) null entries, because the table productInfo has two columns (2)
  null, # other calc (50%)
  discountPrice, #possible values for 75% off  
  from  allowableDiscounts 
  
  )

  select *,
  nearest(exact50off,
  last_value(if(tmp=0,exact50off,null) ignore nulls) over (order by exact50off),
  last_value(if(tmp=0,exact50off,null) ignore nulls) over (order by exact50off desc)
  ) as  closestMatch50off,
  nearest(exact75off,
  last_value(if(tmp=0,exact75off,null) ignore nulls) over (order by exact75off),
  last_value(if(tmp=0,exact75off,null) ignore nulls) over (order by exact75off desc)
  ) as  closestMatch75off,

  from helper
 qualify tmp=1
  order by exact50off

CodePudding user response:

Use the ABS(X) function to compute the absolute values between the columns in the tables to make a match as an exact match or a difference in values between 1 and 4 for the various discount values as below, use a LEFT JOIN to get allow values in your leading table productInfo and either matching values or NULL from the allowableDiscounts table.

SELECT
   p.Item,
   p.OrigPrice,
   p.OrigPrice * 0.5 AS exact50off,
   p.OrigPrice * 0.25 AS exact25off,
   p.OrigPrice * 0.75 AS exact75off,
   q.discountPrice AS closestMatch
FROM
   productInfo AS p
JOIN allowableDiscounts q on ABS(p.OrigPrice * 0.50 - q.discountPrice) = 0
OR ABS(p.OrigPrice * 0.50 - q.discountPrice) BETWEEN 0.01 AND 4.0
OR ABS(p.OrigPrice * 0.25 - q.discountPrice) = 0
OR ABS(p.OrigPrice * 0.75 - q.discountPrice) = 0 
OR ABS(p.OrigPrice * 0.25 - q.discountPrice) BETWEEN 0.01 AND 4.0
OR ABS(p.OrigPrice * 0.75 - q.discountPrice) BETWEEN 0.01 AND 4.0;
  • Related