Home > Software engineering >  Insert missing values in column at all levels of another column in SQL?
Insert missing values in column at all levels of another column in SQL?

Time:07-20

I have been working with some big data in SQL/BigQuery and found that it has some holes in it that need to be filled with values in order to complete the dataset. What I'm struggling with is how to insert the missing values properly.

Say that I have multiple levels of a variable (1, 2, 3... no upper bound) and for each of these levels, they should have an A, B, C value. Some of these records will have data, others will not.

Current dataset:

level value data
1     A     1a_data
1     B     1b_data
1     C     1c_data
2     A     2a_data
2     C     2c_data
3     B     3b_data

What I want the dataset to look like:

level value data
1     A     1a_data
1     B     1b_data
1     C     1c_data
2     A     2a_data
2     B     NULL
2     C     2c_data
3     A     NULL
3     B     3b_data
3     C     NULL

What would be the best way to do this?

CodePudding user response:

You need a CROSS join of the distinct levels with the distinct values and a LEFT join to the table:

SELECT l.level, v.value, t.data
FROM (SELECT DISTINCT level FROM tablename) l
CROSS JOIN (SELECT DISTINCT value FROM tablename) v
LEFT JOIN tablename t ON t.level = l.level AND t.value = v.value
ORDER BY l.level, v.value;

See the demo.

CodePudding user response:

We can use an INSERT INTO ... SELECT with the help of a calendar table:

INSERT INTO yourTable (level, value, data)
SELECT t1.level, t2.value, NULL
FROM (SELECT DISTINCT level FROM yourTable) t1
CROSS JOIN (SELECT DISTINCT value FROM yourTable) t2
LEFT JOIN yourTable t3
    ON t3.level = t1.level AND
       t3.value = t2.value
WHERE t3.data IS NULL;
  • Related