I am in the development stages of an experimental system in Java with a heavy MySQL DB, containing thousands of records, for each of which is required to perform a number of operations, and in parallel.
I'm well aware of using Java Threads, but I do not know what is the best/efficient way to use it on masses of records from DB.
Suppose we look at the following DB table:
Table technicians
ID NUMBER
DISTRICT_CODE NUMBER NOT NULL
EVENT_START_DATE DATE NOT NULL
EVENT_END_DATE DATE NOT NULL
INCHARGE NUMBER NOT NULL
EFFECTIVE_FROM DATE DEFAULT SYSDATE NOT NULL
EFFECTIVE_TO DATE
STATUS NUMBER NOT NULL
Then we'll perform the following extraction:
SELECT * FROM technicians WHERE INCHARGE = 23;
Now, I'm now seriously debating whether to put the extracted info into a List (for example, ArrayList), or other data structure, (Note that each extraction contains about 4000 records, and it occurs every 3 seconds over and over again) and how to implement Thread for each of the records individually.
The naive idea that has been raised, is that after querying the DB and receiving the information, go through each record entry in a loop (sql.hasNext () e.g.) and run the ThreadPoolExecutor object on each record, but I tend to believe that there are more efficient and faster ways.
Any suggestion is welcome
Edit: I see questions have been raised about the actions to be taken for each of the records, so I will try to answer that.
For each line, we will run several different APIs for each field to ensure its correctness type of answer (e.g. correct, incorrect, correct but the value is short, etc.) on so on.
It is important for me to note that each of the actions occurs towards an API that is external to the system (located on a different remote server), so sometimes for a single field, several calls to different APIs will be made, so high power and parallel work are important.
For example:
For the INCHARGE field - we will send the value to an external API source that will check the data, and if the information is correct then we will send the field again to another API and we will get information related to it.
CodePudding user response:
It seems you want to process some rows in a database every three seconds. For each time, you want to query for about four thousand rows. Each of those rows needs to be individually processed, without regard for the other rows in that table. It sounds like you are not updating the rows, but instead sending the row’s data via calls to other services such as making web services calls.
Yes, use executor services
So load you data into memory, as the volume seems low. Define a class to hold the data for each row. Since we are using this class primarily to communicate data transparently and immutable, define the class as a record.
record Technician ( int id , LocalDate eventStart , … ) {}
Instantiate these Technician
objects as you loop the result set from your query.
For each Technician
object, pass to the constructor of a class implementing Callable
. The run
method of that class defines the work you need to do in processing that row’s data, passing to web service calls, etc.
A Callable
returns a value. Let's define another record to signal success/failure and the ID of the record.
record TechnicianProcessingResult ( int id , boolean succeeded ) {}
Make that record be the type of our Callable
.
class ProcessTechnicianTask implements Callable< TechnicianProcessingResult > {
private final Technician technician ;
ProcessTechnicianTask( Technician t ) { // Constructor.
this.technician = t ;
}
public TechnicianProcessingResult call() {
System.out.println( "Processing technician Id " this.technician.id );
…
return new TechnicianProcessingResult( this.technician.id , true ) ;
… or …
return new TechnicianProcessingResult( this.technician.id , false ) ;
}
}
Instantiate a task for each Technician
object you instantiated for each row retrieved from database. Collect the tasks.
List< ProcessTechnicianTask > tasks = new ArrayList<>() ;
…
tasks.add( new ProcessTechnicianTask( nthTechnician ) ) ;
Submit that collection of tasks an executor service you have already established. Generally specify almost as many threads as you have CPU cores available.
ExecutorService executorService = Executors.newFixedThreadPool( 5 ) ;
…
List< Future< TechnicianProcessingResult > > futures = executorService.invokeAll( tasks , 3 , TimeUnit.SECONDS ) ;
Notice the time-out arguments, to fire in case something goes wrong and your tasks take too much time to complete.
Check the list of futures to see if they are done, and if any were canceled, and check their result object.
You want to repeat that every three seconds. So also create a single-threaded ScheduledExecutorService
. Schedule a repeating task, a Runnable
or Callable
, that does the above work of database query, instantiating Technician
objects, assigning each to a ProcessTechnicianTask
object, all submitted to our other executor service.
Be sure to gracefully shutdown your executor service objects. Otherwise their backing thread pools may continue running indefinitely, like a zombie