Problem statement: It is necessary to copy over a 100GB database from a GCP Postgres DB
into GCP BigQuery
located in the same region/availability zone. Ideally, I'd like to be able to make a full copy every 15-20 minutes.
Attempt at the solution:
- Using Airbyte --> take several hours which is waaaaay too slow.
- Using pg_dump --> Also did quiet make the mark.
Question: What other things can I try ? What other tools are available which will enable me to do this ? I think the issue here is the network transfer speed ... I seem to be somehow getting about 2Mb/s or similar data transfer speed ... which is super slow.
Any suggestions about how I can make this fast in GCP
?
CodePudding user response:
If you use Cloud SQL for Postgres, the easiest way is to use federated queries. You can create a connection in BigQuery to Cloud SQL, to query Cloud SQL directly from BigQuery.
Perform an INSERT INTO .... SELECT . And your data are copied.
Note, you must have a public IP on your Cloud SQL instance (you can keep no authorized network, therefore your database is still locked and protected from the wild internet)