Home > Mobile >  DeCorrelated SubQueries in Google BigQuery?
DeCorrelated SubQueries in Google BigQuery?

Time:03-04

I have been struggling with a problem for hours. I have found myself down multiple rabbit holes and into the realms of DeCorrelated SubQueries which are frankly beyond me...

I have two tables and I'm trying to pull from both without a common column to join against. I need to take the a value from table 1, find the closest value (that is lower) in table 2 and then pull related data from table 2.

table_1

id score
1 99.983545
2 98.674359
3 97.832475
4 96.184545
5 93.658572
6 89.963544
7 87.427353
8 82.883345

table_2

average_level percentile
99.743545 99
97.994359 98
97.212485 97
96.987545 96
95.998573 95
88.213584 94
87.837384 93
80.982147 92

From the two tables above I need to:

  1. Take the id and score
  2. identify the closest average_level to the score
  3. include the correlating average_level and percentile

The hoped for output would look like this...

id score average_level percentile
1 99.983545 99.743545 99
2 98.674359 97.994359 98
3 97.832475 97.212485 97
4 96.184545 95.998573 95
5 93.658572 88.213584 94
6 89.963544 88.213584 94
7 87.427353 80.982147 92
8 82.883345 80.982147 92

Any help or advice would be very much appreciated

CodePudding user response:

You can do this by joining both tables with table_1.score >= table_2.average_level and then getting the max(average_level) and max(average_level) - which will be the closet yet inferior or equal values from table_2 - and grouping by the fields in table_1:

SELECT TABLE_1.ID, TABLE_1.SCORE, 
MAX(TABLE_2.AVERAGE_LEVEL) AS AVERAGE_LEVEL, 
MAX(TABLE_2.PERCENTILE) AS PERCENTILE
FROM TABLE_1 INNER JOIN TABLE_2
ON TABLE_1.SCORE >= TABLE_2.AVERAGE_LEVEL
GROUP BY TABLE_1.ID, TABLE_1.SCORE
ORDER BY TABLE_1.ID

I add the fiddle example here, it also includes @Ömer's answer

CodePudding user response:

if we say first table score and second one avarage you can try this.

select *
from Score s
inner join average a on a.Percentile = (select top(1) al.Percentile from average al order by Abs(average_level - s.score))

enter image description here

CodePudding user response:

Use lag function on percentile table to attach the values from next higher row e.g.:

l_level l_percentile u_level u_percentile
99.743545 99 null null
97.994359 98 99.743545 99
97.212485 97 97.994359 98
96.987545 96 97.212485 97

Now join it with score table using the condition:

on score >= l_level
and (score < u_level or u_level is null)

This will join each score with exactly one range. In order to find whether the score value is closer to the lower bound or upper bound, just use case:

case when u_level - score < score - l_level then u_level else l_level end,
case when u_level - score < score - l_level then u_percentile else l_percentile end,

Window functions should, in theory, give you better performance than self joins.

DBFiddle using SQL Server

  • Related