Home > Mobile >  Fastest way to copy over a postgres database to BigQuery
Fastest way to copy over a postgres database to BigQuery

Time:08-17

Problem statement: It is necessary to copy over a 100GB database from a GCP Postgres DB into GCP BigQuery located in the same region/availability zone. Ideally, I'd like to be able to make a full copy every 15-20 minutes.

Attempt at the solution:

  1. Using Airbyte --> take several hours which is waaaaay too slow.
  2. Using pg_dump --> Also did quiet make the mark.

Question: What other things can I try ? What other tools are available which will enable me to do this ? I think the issue here is the network transfer speed ... I seem to be somehow getting about 2Mb/s or similar data transfer speed ... which is super slow.

Any suggestions about how I can make this fast in GCP ?

CodePudding user response:

If you use Cloud SQL for Postgres, the easiest way is to use federated queries. You can create a connection in BigQuery to Cloud SQL, to query Cloud SQL directly from BigQuery.

Perform an INSERT INTO .... SELECT . And your data are copied.

Note, you must have a public IP on your Cloud SQL instance (you can keep no authorized network, therefore your database is still locked and protected from the wild internet)

  • Related