Home > Software engineering >  How can I add data to BigQuery without problems with rate limits?
How can I add data to BigQuery without problems with rate limits?

Time:06-29

I currently have a system in which I want to send data from that system via a Google Cloud Function to BigQuery (using Python). My idea was to do this in two simple steps:

  1. The system calls a Google Cloud Function and sends JSON data to that function
  2. The Google Cloud Function unpacks the data and stores it inside a Google BigQuery table using the pandas-gbq package.

I thought it was a good idea, until I realized there were rate limits on Google BigQuery regarding adding data (see https://community.microstrategy.com/s/article/Known-Limitations-of-Adding-Data-into-Google-BigQuery?language=en_US for example).

Because I want to use this system to trigger more than a thousand times a day, I think I need a better setup.

What would you recommend me to do?

I already thought of two things:

  1. Using Google Datastore for adding data, but I am not sure whether that has the same kind of rate limits and if I can easily transfer that data to BigQuery
  2. Instead of sending data on a case-by-case basis, collect (for example) 500 calls to a Google Cloud Function. Once the data of 500 cases is received, send those 500 cases in one go to BigQuery. I am not sure whether this will work and if it's even possible to do this with Google Cloud Functions.

CodePudding user response:

The mentioned limits only apply to load jobs.

You can use streaming inserts instead, which do not have those limits.

I am not sure if pandas-gbq uses load jobs or streaming inserts under the hood.
Anyway, you can also use the BigQuery Python Client and the method insert_rows_from_dataframe which uses the streaming API.

CodePudding user response:

For your requirement, you can use Google provided Dataflow Templates that contain templates where you can use Pub/Sub to stream data into BigQuery using Dataflow. A streaming pipeline is used to read the JSON formatted data from Cloud Pub/Sub and then write it to BigQuery.

You can choose either Pub/Sub Topic to BigQuery template or Pub/Sub Subscription to BigQuery template according to the requirement. It will incur some cost in BigQuery for data ingestion, for which you can check the pricing given in this document.

  • Related