Home > database >  The best GCP architecture for exporting Bigquery data to an external application with API
The best GCP architecture for exporting Bigquery data to an external application with API

Time:05-14

I use these following GCP products together for a CRM system:

  • Cloud SQL
  • App Engine
  • Bigquery

Once a week an external application exports data from Bigquery in this way:

  1. The external application makes a request to Appengine with a token.
  2. AppEngine retrieves permissions for this token from Cloud SQL, makes some additional computation to obtain a list of allowed IDs.
  3. Appengine runs a Bigquery's query filtered with these ids. Something like that: SELECT * FROM table WHERE id IN(ids)
  4. Appengine responds to the external application with a unmodified result of query in JSON.

The problem is that the export is not very often, but amount of data can be large and I dont want to load AppEngine with this data. What other GCP products are useful in this case? Remember I need to retrieve permissions from Appengine and CloudSQL.

CodePudding user response:

Unclear whether the JSON is just directly from BigQuery query results, or you do additional processing in the application to render/format it. I'm assuming direct results.

An option that comes to mind is to leverage cloud storage. You can use the signed url feature to provide a time-limited link to your (potential large) results without exposing public access.

This, coupled with BigQuery's ability to export results to GCS (either via an export job, or using the newer EXPORT DATA SQL statement allows you to run a query and deliver results directly to GCS.

With this, you could simply redirect the user to the signed URL at the end of your current flow. There's additional features that are complementary here, such as using GCS data lifecycle features to age out and remove files automatically so you don't need to concern yourself with slow accumulation of results.

  • Related