I'm trying to update rows in a scores
table based on the following logic:
- Get the
feat_sum
for ids that don't have ascore
. - For each
feat_sum
that has a NULLscore
, get the row with the nearestfeat_sum
andscore
and then update thescore
field to that score. - If
feat_sum
difference is identical, chose the smaller score - id is the PK of the table
Initial table:
scores
| id | feat_sum | score |
| --- | --- | --- |
| 1 | 1.234 | 341 |
| 2 | 5.678 | 758 |
| 3 | 2.234 | NULL |
| 4 | 8.678 | NULL |
Expected output after query:
scores
| id | feat_sum | score |
| --- | --- | --- |
| 1 | 1.234 | 341 |
| 2 | 5.678 | 758 |
| 3 | 2.234 | 341 |
| 4 | 8.678 | 758 |
e.g. 1.234 is closer to 2.234 than 5.678 is to 2.234, therefore, the score
for 2.234 should be 341.
I think I've got the base query here, but I'm struggling to put the last bit together.
SELECT
id,
feat_sum,
CASE
WHEN score IS NULL
THEN (SELECT score FROM scores WHERE feat_sum - some_other_feat_sum /* has smallest difference */
END AS score
FROM scores;
CodePudding user response:
In the UPDATE
statement join to the table a CTE that returns for each id
with a score
that is null
the closest score
by utilizing the window function FIRST_VALUE()
:
WITH cte AS (
SELECT DISTINCT s1.id,
FIRST_VALUE(s2.score) OVER (PARTITION BY s1.id ORDER BY ABS(s2.feat_sum - s1.feat_sum), s2.score) AS score
FROM scores s1 JOIN scores s2
WHERE s1.score IS NULL AND s2.score IS NOT NULL
)
UPDATE scores s
INNER JOIN cte c ON c.id = s.id
SET s.score = c.score;
See the demo.
CodePudding user response:
I SQL Server, I'd use an APPLY operation, like this:
SELECT s.id, s.feat_sum, COALESCE(s.score, alt.score) as score
FROM scores s
CROSS APPLY (
SELECT TOP 1 score
FROM scores s0
WHERE s0.score IS NOT NULL
ORDER BY abs(s0.feat_sum - s.feat_sum)
) alt
Other databases call this a lateral join. I know MySQL supports this, but the documentation is not clear to me (it only shows the old bad A,B syntax), so this might not be quite right:
SELECT s.id, s.feat_sum, COALESCE(s.score, alt.score) as score
FROM scores s
JOIN LATERAL (
SELECT score
FROM scores s0
WHERE s0.score IS NOT NULL
ORDER BY abs(s0.feat_sum - s.feat_sum)
LIMIT 1
) alt
Most lateral joins that also use a LIMIT 1
can be re-written to run even faster using a windowing function instead. I haven't looked that far ahead yet on this query.
CodePudding user response:
My thought process is as follows,
- get the rows with score = NULL
- calculate the difference of the rows above with all other rows and rank them based on difference, score
- join the rows with least difference with original table to get the score
- and finally a left join which conditionally shows the scores.
with noscore as
(
select *
from Table1
where score is null
),
alldiff as
(
select t1.id, t2.id as diffid, abs(t2.feat_sum-t1.feat_sum) as diff, t2.score
from noscore t1
inner join Table1 t2 on
t1.id != t2.id and t2.score is not null
order by diff,score asc
),
diff as
(
select *, row_number() over (partition by id order by diff asc) as nr
from alldiff
),
mindiff as
(
select df.id, t1.feat_sum, t1.score
from diff df
inner join Table1 t1 on df.diffid = t1.id
where df.nr = 1
)
select t1.id, t1.feat_sum, if(t1.score is null,md.score,t1.score) as score
from Table1 t1
left join mindiff md on t1.id = md.id;
There may be a scope for query optimisation but I guess this works for now. You can use an update join
statement for updating the rows.