We have a batch job that needs to update millions of rows (not all rows) of a table with a particular value.
We have the following implementation :
We process say e.g. 10K records at a time and fire the Update query with hardcoded values in the IN clause like so :
@Query("UPDATE Order or SET or.status= 'PROCESSED' WHERE or.orderId IN (:orderIds)")
@Modifying
void updateProcessedOrders(@Param("orderIds") List<String> orderIds);
The first limitation is that the Update with IN clause can update only 1000 records at a time. Hence we have to split the list of Ids in a batch of 1000s and then create an Update query for each of them. So instead of one query for 10k orderIds we have to fire 10 update queries.
The job works in a reasonable amount of time for 10 to 15 million records however for one of the clients the amount of data is to the tune of 200 million. For 200 million the job takes 2 days to complete which is unacceptable.
So my question is : Is there a more efficient way of updating millions of records with the same value?
CodePudding user response:
One option to remove the 1000 elements in IN list restriction is to create another table, e.g.
create table order_ids (orderId number constraint pk_oid primary key);
and insert all ID values you're working with. It means not 1000 by 1000, but as many as needed (millions, possibly) at once.
Then
update order a set a.status = 'PROCESSED'
where exists (select null
from order_ids b
where b.orderId = a.orderId);
(table's primary key means that Oracle will implicitly index that column; if you allow duplicates, then remove primary key and create index on that column).