Home > Net >  Export only raw data from firestore
Export only raw data from firestore

Time:10-13

I am willing to export all my previous and all future firestore documents to Big Query as tabular format.

Example: If I have a User collection in firestore like:

{
    "name": "Jon",
    "score": 10,
}

Then the BigQuery table should look like this:

 -------------------------------------
|       id       |   name   |  score  |
 ---------------- ---------- ---------
| <document_id>  |   Jon    |    10   |
 ---------------- ---------- ---------

I have tried using

  1. Export Collections to BigQuery. This does not export the data in the above format; it rather keeps track of the each insert/update/delete events.
  2. Manually creating a table in the dataset and importing the backup from firestore. This does get me the BigQuery table view I want, but that is not updated automatically (like the extension does).

What do I do to achieve the tabular format of the document mentioned above as well as it updates automatically when the firestore documents are updated? Do I have to write my own cloud function to update all the data to BigQuery?

CodePudding user response:

The schema-views scripts that come with the "Export Collections to BigQuery" allow you to create views that let you query your data in a more tabular form like you suggested. You can set them up once and then query against them even as new data comes in.

CodePudding user response:

  1. Export Collections to BigQuery. This does not export the data in the above format; it rather keeps track of the each insert/update/delete events.

As explained by Michael, with the "Export Collections to BigQuery" extension you could run the schema-views script, which is provided with the extension, to create a view according to your desired data model.


Another approach would be to write a Cloud Function that directly writes to BigQuery. The following simple example shows how a Scheduled Cloud Function reads the docs from the users collection and add them to a BigQuery table.

It's up to you to adapt it. In particular, if you want to update the BigQuery table when the Firestore documents are created/deleted/updated, you should use an onWrite() Cloud Function.

const functions = require('firebase-functions');

const admin = require('firebase-admin');
admin.initializeApp();

const { BigQuery } = require('@google-cloud/bigquery');

const bigquery = new BigQuery();


exports.exportToBigQuery = functions.pubsub.schedule('every 24 hours').onRun(async (context) => {

    try {

        // get the user docs !! UP TO YOU TO ADAPT THE QUERY !!
        const usersQuerySnapshot = await admin.firestore().collection("users").get();

        const rows = [];

        usersQuerySnapshot.forEach((doc) => {
            rows.push({ id: doc.id, name: doc.get("name"), score: doc.get("score") })
        });

        const dataset = bigquery.dataset("dataset-name"); // Use Environment configuration https://firebase.google.com/docs/functions/config-env
        const table = dataset.table("table-name");

        await table.insert(rows);
        
        return null;

    } catch (error) {
        console.log(error);
        return null;
    }

});
  • Related