Home > Back-end >  Running a query for multiple datasets only different by ID (Big Query)
Running a query for multiple datasets only different by ID (Big Query)

Time:05-02

I need to run the same select statement for multiple datasets in Big Query only different by ID (1568 - 3257 - 8432 etc.) This select statement also uses a join across tables from the same dataset. My question is how can we do this without using UNION ALL ?

SELECT * 
FROM test.1568.activity
INNER JOIN test.1568.employee on (activity.employee_id = employee.id)

Is there any possibility that we can create an array containing the id of different datasets, and then using a loop to process that statement for each dataset?

Thanks in advance.

CodePudding user response:

you can create a script for it, which loops through and runs the query, like this:

DECLARE datasets ARRAY<STRING>;
DECLARE i INT64 DEFAULT 0;
DECLARE datasets_length INT64;

SET datasets = (SELECT ARRAY_AGG(schema_name) FROM `your-project-id.INFORMATION_SCHEMA.SCHEMATA`);
SET datasets_length = (SELECT ARRAY_LENGTH(datasets));

WHILE i < datasets_length DO
  EXECUTE IMMEDIATE format("""
    SELECT COUNT(1) table_count FROM `your-project-id.%s.__TABLES__`
  """, datasets[OFFSET(i)]);
  SET i = i   1;
END WHILE;

This example counts the tables in each dataset in a project. You can change the in the EXECUTE IMMEDIATE statement for what you would like to run on every dataset.

  • Related