DeCorrelated SubQueries in Google BigQuery?-CodePudding

I have been struggling with a problem for hours. I have found myself down multiple rabbit holes and into the realms of DeCorrelated SubQueries which are frankly beyond me...

I have two tables and I'm trying to pull from both without a common column to join against. I need to take the a value from table 1, find the closest value (that is lower) in table 2 and then pull related data from table 2.

table_1

id	score
1	99.983545
2	98.674359
3	97.832475
4	96.184545
5	93.658572
6	89.963544
7	87.427353
8	82.883345

table_2

average_level	percentile
99.743545	99
97.994359	98
97.212485	97
96.987545	96
95.998573	95
88.213584	94
87.837384	93
80.982147	92

From the two tables above I need to:

Take the id and score
identify the closest average_level to the score
include the correlating average_level and percentile

The hoped for output would look like this...

id	score	average_level	percentile
1	99.983545	99.743545	99
2	98.674359	97.994359	98
3	97.832475	97.212485	97
4	96.184545	95.998573	95
5	93.658572	88.213584	94
6	89.963544	88.213584	94
7	87.427353	80.982147	92
8	82.883345	80.982147	92

Any help or advice would be very much appreciated

CodePudding user response：

You can do this by joining both tables with table_1.score >= table_2.average_level and then getting the max(average_level) and max(average_level) - which will be the closet yet inferior or equal values from table_2 - and grouping by the fields in table_1:

SELECT TABLE_1.ID, TABLE_1.SCORE, 
MAX(TABLE_2.AVERAGE_LEVEL) AS AVERAGE_LEVEL, 
MAX(TABLE_2.PERCENTILE) AS PERCENTILE
FROM TABLE_1 INNER JOIN TABLE_2
ON TABLE_1.SCORE >= TABLE_2.AVERAGE_LEVEL
GROUP BY TABLE_1.ID, TABLE_1.SCORE
ORDER BY TABLE_1.ID

I add the fiddle example here, it also includes @Ömer's answer

CodePudding user response：

if we say first table score and second one avarage you can try this.

select *
from Score s
inner join average a on a.Percentile = (select top(1) al.Percentile from average al order by Abs(average_level - s.score))

enter image description here

CodePudding user response：

Use lag function on percentile table to attach the values from next higher row e.g.:

l_level	l_percentile	u_level	u_percentile
99.743545	99	null	null
97.994359	98	99.743545	99
97.212485	97	97.994359	98
96.987545	96	97.212485	97

Now join it with score table using the condition:

on score >= l_level
and (score < u_level or u_level is null)

This will join each score with exactly one range. In order to find whether the score value is closer to the lower bound or upper bound, just use case:

case when u_level - score < score - l_level then u_level else l_level end,
case when u_level - score < score - l_level then u_percentile else l_percentile end,

Window functions should, in theory, give you better performance than self joins.

DBFiddle using SQL Server