Does anyone have good ideas about what kind of a data model makes sense in Firestore for time dependent data?
I have a lot of event data that I would like to store in Firestore and then run an analysis on it
Each event has a timestamp and I would like to run the aggregated analysis for example for 1 day of data, 7 day of data, X days of data, 1 month, X months, etc
How should this be setup in firestore, 7 days of event data is already a lot of data that and I can't return it to the client and make the analysis there. If I aggregate some predefined set of days beforehand in firestore it is then locked to only those days and you can't choose an arbitrary amount of days. I would also need to keep updating the aggregated data every time there is new data
Any help much appreciated!
CodePudding user response:
As I understand you're looking to perform a query similar to:
SELECT hits, COUNT(*) FROM event_type_api GROUP BY hits WHERE start_date > TODAY - X
Firestore is a NoSQL database, but that doesn't mean that you cannot know the number of documents in a query. You cannot in SQL terms, but you can count them. It's a little costly to read all documents in a collection to count the number of documents. That's why you need to call count(). As you already mentioned, there is also no "GROUP BY" present in Firestore. However, we can achieve almost the same thing.
Assuming that you'll create a collection called "hits" in which you store documents that have a field of type timestamp, then you can perform the following query:
val queryByTimestamp = db.collection("hits").whereGreaterThan("timestamp", TODAY - X)
If you want to know how many documents the query returns, you need to call count() like this:
val numberOfDocuments = queryByTimestamp.count()
The last thing is related to grouping. As mentioned before, Firestore doesn't offer any aggregation queries such as grouping. However, there are two solutions for this use case. You can get all the documents in the hits
collection, and then group them in your application code by whatever fields you need. Or you can create a separate collection of pre-grouped documents to satisfy your needs. The latter is the recommended approach. And remember, that duplicating data is a quite common practice when it comes to NoSQL databases.