I need to run the same select statement for multiple datasets in Big Query only different by ID (1568 - 3257 - 8432 etc.) This select statement also uses a join across tables from the same dataset. My question is how can we do this without using UNION ALL ?
SELECT *
FROM test.1568.activity
INNER JOIN test.1568.employee on (activity.employee_id = employee.id)
Is there any possibility that we can create an array containing the id of different datasets, and then using a loop to process that statement for each dataset?
Thanks in advance.
CodePudding user response:
you can create a script for it, which loops through and runs the query, like this:
DECLARE datasets ARRAY<STRING>;
DECLARE i INT64 DEFAULT 0;
DECLARE datasets_length INT64;
SET datasets = (SELECT ARRAY_AGG(schema_name) FROM `your-project-id.INFORMATION_SCHEMA.SCHEMATA`);
SET datasets_length = (SELECT ARRAY_LENGTH(datasets));
WHILE i < datasets_length DO
EXECUTE IMMEDIATE format("""
SELECT COUNT(1) table_count FROM `your-project-id.%s.__TABLES__`
""", datasets[OFFSET(i)]);
SET i = i 1;
END WHILE;
This example counts the tables in each dataset in a project. You can change the in the EXECUTE IMMEDIATE
statement for what you would like to run on every dataset.