Home > Net >  Is it better to count in server side API using java stream() then using count query call repeatedly
Is it better to count in server side API using java stream() then using count query call repeatedly

Time:09-29

I want to count the number of rows in a table three times depending on three filters/conditions. I want to know which one of the following two ways is better for performance and cost-efficiency. We are using AWS as our server, java spring to develop server-side API and MySQL for the database.

  1. Use the count feature of MySQL to query three times in the database for three filtering criteria to get the three count result.
  2. Fetch all the rows of the table from the database first using only one query. Then using java stream three times based on three filtering criteria to get the three count result.

CodePudding user response:

It'll be better to go with option (1) in extreme cases. If it's slow to execute SELECT COUNT(*) FROM table then you should consider some tweak on SQL side. Not sure what you're using but I found this example for sql server

Assuming you go with Option (2) and you have hundreds of thousands of rows, I suspect that your application will run out of memory (especially under high load) before you have time to worry about slow response time from running SELECT count(*). Not to mention that you'll have lots of unnecessary rows and slow down transfer time between database and application

CodePudding user response:

A basic argument against doing counts in the app is that hauling lots of data from the server to the client is time-consuming. (There are rare situations where it is worth the overhead.) Note that your client and AWS may be quite some distance apart, thereby exacerbating the cost of shoveling lots of data. I am skeptical of what you call "server-side API". But even if you can run Java on the server, there is still some cost of shoveling between MySQL and Java.

Sometimes this pattern lets you get 3 counts with one pass over the data:

SELECT 
    SUM(status='ready') AS ready_count,
    SUM(status='complete') AS completed_count,
    SUM(status='unk') AS unknown_count,
    ...

The trick here is that a Boolean expression has a value of 0 (for false) or 1 (for true). Hence the SUM() works like a 'conditional count'.

  • Related