I am in the development stages of an experimental system in Java with a heavy MySQL DB, containing thousands of records, for each of which is required to perform a number of operations, and in parallel.

I'm well aware of using Java Threads, but I do not know what is the best/efficient way to use it on masses of records from DB.

Suppose we look at the following DB table:

Table technicians
    ID          NUMBER
    DISTRICT_CODE       NUMBER NOT NULL
    EVENT_START_DATE    DATE NOT NULL
    EVENT_END_DATE      DATE NOT NULL
    INCHARGE        NUMBER NOT NULL
    EFFECTIVE_FROM      DATE DEFAULT SYSDATE NOT NULL
    EFFECTIVE_TO        DATE
    STATUS          NUMBER NOT NULL

Then we'll perform the following extraction:

SELECT * FROM technicians WHERE INCHARGE = 23;

Now, I'm now seriously debating whether to put the extracted info into a List (for example, ArrayList), or other data structure, (Note that each extraction contains about 4000 records, and it occurs every 3 seconds over and over again) and how to implement Thread for each of the records individually.

The naive idea that has been raised, is that after querying the DB and receiving the information, go through each record entry in a loop (sql.hasNext () e.g.) and run the ThreadPoolExecutor object on each record, but I tend to believe that there are more efficient and faster ways.

Any suggestion is welcome

Edit: I see questions have been raised about the actions to be taken for each of the records, so I will try to answer that.

For each line, we will run several different APIs for each field to ensure its correctness type of answer (e.g. correct, incorrect, correct but the value is short, etc.) on so on.

It is important for me to note that each of the actions occurs towards an API that is external to the system (located on a different remote server), so sometimes for a single field, several calls to different APIs will be made, so high power and parallel work are important.

For example:

For the INCHARGE field - we will send the value to an external API source that will check the data, and if the information is correct then we will send the field again to another API and we will get information related to it.

CodePudding user response：

It seems you want to process some rows in a database every three seconds. For each time, you want to query for about four thousand rows. Each of those rows needs to be individually processed, without regard for the other rows in that table. It sounds like you are not updating the rows, but instead sending the row’s data via calls to other services such as making web services calls.

Yes, use executor services

So load you data into memory, as the volume seems low. Define a class to hold the data for each row. Since we are using this class primarily to communicate data transparently and immutable, define the class as a record.

record Technician ( int id , LocalDate eventStart , … ) {}

Instantiate these Technician objects as you loop the result set from your query.

For each Technician object, pass to the constructor of a class implementing Callable. The run method of that class defines the work you need to do in processing that row’s data, passing to web service calls, etc.

A Callable returns a value. Let's define another record to signal success/failure and the ID of the record.

record TechnicianProcessingResult ( int id , boolean succeeded ) {}

Make that record be the type of our Callable.

class ProcessTechnicianTask implements Callable< TechnicianProcessingResult > {

    private final Technician technician ;

    ProcessTechnicianTask( Technician t ) { // Constructor.
        this.technician = t ;
    }
 
    public TechnicianProcessingResult call() {
        System.out.println( "Processing technician Id "   this.technician.id );
        …
        return new TechnicianProcessingResult( this.technician.id , true ) ;
        … or …
        return new TechnicianProcessingResult( this.technician.id , false ) ;
    }
}

Instantiate a task for each Technician object you instantiated for each row retrieved from database. Collect the tasks.

List< ProcessTechnicianTask > tasks = new ArrayList<>() ;
…
tasks.add( new ProcessTechnicianTask( nthTechnician ) ) ;

Submit that collection of tasks an executor service you have already established. Generally specify almost as many threads as you have CPU cores available.

ExecutorService executorService = Executors.newFixedThreadPool( 5 ) ;
…
List< Future< TechnicianProcessingResult > > futures = executorService.invokeAll( tasks , 3 , TimeUnit.SECONDS ) ;

Notice the time-out arguments, to fire in case something goes wrong and your tasks take too much time to complete.

Check the list of futures to see if they are done, and if any were canceled, and check their result object.

You want to repeat that every three seconds. So also create a single-threaded ScheduledExecutorService. Schedule a repeating task, a Runnable or Callable, that does the above work of database query, instantiating Technician objects, assigning each to a ProcessTechnicianTask object, all submitted to our other executor service.

Be sure to gracefully shutdown your executor service objects. Otherwise their backing thread pools may continue running indefinitely, like a zombie