Home > Net >  MySQL fill NULL row with value based on the nearest value in another row
MySQL fill NULL row with value based on the nearest value in another row

Time:01-25

I'm trying to update rows in a scores table based on the following logic:

  • Get the feat_sum for ids that don't have a score.
  • For each feat_sum that has a NULL score, get the row with the nearest feat_sum and score and then update the score field to that score.
  • If feat_sum difference is identical, chose the smaller score
  • id is the PK of the table

Initial table:

scores

| id  | feat_sum | score |
| --- | ---      | ---   |
| 1   | 1.234    | 341   |
| 2   | 5.678    | 758   |
| 3   | 2.234    |  NULL |
| 4   | 8.678    |  NULL |

Expected output after query:

scores

| id  | feat_sum | score |
| --- | ---      | ---   |
| 1   | 1.234    | 341   |
| 2   | 5.678    | 758   |
| 3   | 2.234    | 341   | 
| 4   | 8.678    | 758   |

e.g. 1.234 is closer to 2.234 than 5.678 is to 2.234, therefore, the score for 2.234 should be 341.

I think I've got the base query here, but I'm struggling to put the last bit together.

SELECT 
  id,
  feat_sum,
  CASE 
    WHEN score IS NULL
    THEN (SELECT score FROM scores WHERE feat_sum - some_other_feat_sum /* has smallest difference */
  END AS score
FROM scores;

CodePudding user response:

In the UPDATE statement join to the table a CTE that returns for each id with a score that is null the closest score by utilizing the window function FIRST_VALUE():

WITH cte AS (
  SELECT DISTINCT s1.id, 
         FIRST_VALUE(s2.score) OVER (PARTITION BY s1.id ORDER BY ABS(s2.feat_sum - s1.feat_sum), s2.score) AS score
  FROM scores s1 JOIN scores s2
  WHERE s1.score IS NULL AND s2.score IS NOT NULL
)
UPDATE scores s 
INNER JOIN cte c ON c.id = s.id
SET s.score = c.score;

See the demo.

CodePudding user response:

I SQL Server, I'd use an APPLY operation, like this:

SELECT s.id, s.feat_sum, COALESCE(s.score, alt.score) as score
FROM scores s
CROSS APPLY (
    SELECT TOP 1 score 
    FROM scores s0
    WHERE s0.score IS NOT NULL
    ORDER BY abs(s0.feat_sum - s.feat_sum)
) alt

Other databases call this a lateral join. I know MySQL supports this, but the documentation is not clear to me (it only shows the old bad A,B syntax), so this might not be quite right:

SELECT s.id, s.feat_sum, COALESCE(s.score, alt.score) as score
FROM scores s
JOIN LATERAL (
    SELECT score 
    FROM scores s0
    WHERE s0.score IS NOT NULL
    ORDER BY abs(s0.feat_sum - s.feat_sum)
    LIMIT 1
) alt

Most lateral joins that also use a LIMIT 1 can be re-written to run even faster using a windowing function instead. I haven't looked that far ahead yet on this query.

CodePudding user response:

My thought process is as follows,

  1. get the rows with score = NULL
  2. calculate the difference of the rows above with all other rows and rank them based on difference, score
  3. join the rows with least difference with original table to get the score
  4. and finally a left join which conditionally shows the scores.
with noscore as 
(
    select * 
    from Table1 
    where score is null
),
alldiff as 
(
    select t1.id, t2.id as diffid, abs(t2.feat_sum-t1.feat_sum) as diff, t2.score 
    from noscore t1 
    inner join Table1 t2 on  
    t1.id != t2.id and t2.score is not null 
    order by diff,score asc
),
diff as 
(
    select *, row_number() over (partition by id order by diff asc) as nr 
    from alldiff
),
mindiff as 
(
    select df.id, t1.feat_sum, t1.score 
    from diff df 
    inner join Table1 t1 on df.diffid = t1.id
    where df.nr = 1
)
select t1.id, t1.feat_sum, if(t1.score is null,md.score,t1.score) as score 
from Table1 t1 
left join mindiff md on t1.id = md.id;

There may be a scope for query optimisation but I guess this works for now. You can use an update join statement for updating the rows.

Try it out here

  • Related