Home > database >  Is it ok to perform a full collection scan on a low volume firestore collection?
Is it ok to perform a full collection scan on a low volume firestore collection?

Time:11-02

My mobile application's backend is a few small firebase/google cloud functions which perform various CRUD operations on my firestore database. I have come into the issue where I require to update all documents in a collection daily. I have a cron job which triggers the function every day at a specified time. To avoid doing a full collection scan, I tried a few hacky ways to get around not being able to self invoke functions like with AWS lambdas. Ultimately these did not work. As I am not expecting high volumes of data (each document up to around 8 string fields; expecting a max of 10,000 documents) I was thinking maybe a full collection scan wouldn't actually be that expensive. Has anyone had experience with full table scans on firestore, and what was the performance like?

CodePudding user response:

I have an expiration status field and I need this to be updated [daily] as the value of it is based on the date.

Instead of scanning the entire collection every day, consider using Cloud Task to trigger each individual document exactly when it expires. This is likely to be more efficient over time as it scales with the number of expiring documents, rather than with the total number of documents. For an example of triggering individual documents through Cloud Tasks, see Doug's blog post: How to schedule a Cloud Function to run in the future with Cloud Tasks (to build a Firestore document TTL).


Even if you keep doing this task as you're doing it now, you shouldn't need to do a full table scan. Rather you should be able to use a query based on the current date and the field in the documents that determines whether they are expired. This means you're loading a subset of the data.

Finally, if the number of documents you'll need to process will continue to grow, implement cursor based pagination so that you process the total number of documents in batches - and don't run out of memory as the number of document grows.

  • Related