I would like to pull 30K rows at random from our data store to create one data set, then 30K more rows for a second data set that doesn't overlap any ids with the first.
My idea for how to make it would would be to somehow reference the id columns pulled in the first subquery when drawing a second subquery, then return their union:
SELECT * FROM (
SELECT id_col, A, B, C, 'group1' as label
FROM my_db
LIMIT 30000
) as t1
UNION ALL
(
SELECT id_col, A, B, C, 'group2' as label
FROM my_db
WHERE id_col NOT IN t1.id_col
LIMIT 30000
) as t2
But this does not work as I get "syntax error at or near t1" .
Updated: add label to column to show how a union would have created a tall format for the two groups.
CodePudding user response:
As klin pointed out in the comments, you would need to use a Common Table Expression (CTE) in order to achieve your desired result:
WITH t1 AS (
SELECT id_col, A, B, C, 'group1' AS label
FROM my_db
LIMIT 30000
), t2 AS (
SELECT id_col, A, B, C, 'group2' AS label
FROM my_db
WHERE id_col NOT IN (SELECT id_col FROM t1)
LIMIT 30000
)
SELECT id_col, A, B, C, label
FROM t1
UNION
SELECT id_col, A, B, C, label
FROM t2
You also would not need to do a UNION ALL, a UNION should suffice.