I need to migrate data from RDS to BQ so I can run models on Vertex AI.
The tables from RDS need to be on BQ as fast as possible, with low sync delay between the main database and the BQ replica.
I want to create a trigger that when the database on RDS is updated, it will automatically update the BQ database.
I saw the BQ Data Transfer Service tool, could it work for this case?
Can I migrate more than one table per job on a trigger time basis?
CodePudding user response:
BigQuery Data Transfer Service is the tool available in GCP for redshift migration to Bigquery. For requirements such as prerequisites and permission for the migration, you may refer to this GCP documentation: https://cloud.google.com/bigquery-transfer/docs/redshift-migration#overview
For realtime update requirement from redshift to Bigquery, BigQuery Data Transfer only transfers on a scheduled, managed basis. GCP Documentation: https://cloud.google.com/bigquery-transfer/docs/introduction
For your requirement of migrating more than one table per job on a trigger time basis, BigQuery has a load quota of 15 TB, per load job, per table. You may refer to this document for Quotas and Limits: https://cloud.google.com/bigquery-transfer/docs/redshift-migration#quotas_and_limits. Given this limitation, GCP helps you to estimate how many load jobs are required by your transfers for efficency by coming up with this formula:
Number of daily jobs = Number of transfers x Number of tables x Schedule frequency x Refresh window
You may refer to this documentation for further explanation of this formula: https://cloud.google.com/bigquery-transfer/quotas#load_jobs