What are the performance advantages of using database aggregate functions as opposed to getting all the data and doing the computation on the server or storing the same data in multiple documents or rows?
What are the measurable pros and cons?
CodePudding user response:
It depends on your use case, but in general performing heavy computation directly on the database is better (unless you have some very complicated operation which performing on the database doesn't seem beneficial in terms of performance/development time/ease of understanding).
From a performance P.O.V here are some pointers
- A server is meant to handle multiple requests, listen to events, make api calls and a lot more, and it needs to be responsive at the same time. Clearly, a servers memory is a crucial resource, and so is its processing time. If you bring a large chunk of data into the server, it will reside in-memory and if the computation takes time, it will hamper the response time(synchronous/asynchronous computation). This will also affect the garbage collection of the server.
- Assuming you choose to perform computation on the server, a server is not optimized to perform such operations. Ex: Lets say you want to find the maximum of a million records, you receive a request, bring data in memory and perform the max operation. As the number of requests increase, you will keep bringing a million records in-memory(see where this goes?). Once you have the data, chances are your data needs to be scanned linearly to find the maximum(not bad but not good as well). The same thing when performed on a database, it wont create multiple copies of the data. A database stores data in an optimized way(indexes) and may also store some statistical data which may make aggregation operations cheap.
- A database can be scaled to a larger magnitude(horizontally/vertically) as compared to servers. Computation on a server would result in a bottleneck much quicker than it would if done on a database. In many cases, the database is a separate machine from the server, so you are utilizing the resources much better offloading computation to the database. You are also avoiding single point of failure by doing so.
TLDR
A database is meant and designed to handle heavy computation whereas a server is meant and designed to handle requests.