I have been working with some big data in SQL/BigQuery and found that it has some holes in it that need to be filled with values in order to complete the dataset. What I'm struggling with is how to insert the missing values properly.
Say that I have multiple levels of a variable (1, 2, 3...
no upper bound) and for each of these levels, they should have an A, B, C
value. Some of these records will have data, others will not.
Current dataset:
level value data
1 A 1a_data
1 B 1b_data
1 C 1c_data
2 A 2a_data
2 C 2c_data
3 B 3b_data
What I want the dataset to look like:
level value data
1 A 1a_data
1 B 1b_data
1 C 1c_data
2 A 2a_data
2 B NULL
2 C 2c_data
3 A NULL
3 B 3b_data
3 C NULL
What would be the best way to do this?
CodePudding user response:
You need a CROSS
join of the distinct level
s with the distinct value
s and a LEFT
join to the table:
SELECT l.level, v.value, t.data
FROM (SELECT DISTINCT level FROM tablename) l
CROSS JOIN (SELECT DISTINCT value FROM tablename) v
LEFT JOIN tablename t ON t.level = l.level AND t.value = v.value
ORDER BY l.level, v.value;
See the demo.
CodePudding user response:
We can use an INSERT INTO ... SELECT
with the help of a calendar table:
INSERT INTO yourTable (level, value, data)
SELECT t1.level, t2.value, NULL
FROM (SELECT DISTINCT level FROM yourTable) t1
CROSS JOIN (SELECT DISTINCT value FROM yourTable) t2
LEFT JOIN yourTable t3
ON t3.level = t1.level AND
t3.value = t2.value
WHERE t3.data IS NULL;