performance issue while calling batchUpdate by java to update multiple rows on mysql 8-CodePudding

I met performance issue with batchUpdate for multiple rows on mysql 8 by using java.

There are more than 400,000 rows in the table , I was trying to update a column by it's id.

The sql is like: update table_name set status='status' where id = 1.

I used batchUpdate to update 10,000 rows per time.

There is an index for the column "id", and the parameter was set to jdbcUrl too:

rewriteBatchedStatements=true

The java code is :

String sql = "update table_name set status='status' where id=?";
try(Connection conn = ds.getConnection();PreparedStatement ps = conn.prepareStatement(sql)){
            for(int i=0;i<idList.size();i  ){
                ps.setInt(1,idList.get(i));
                ps.addBatch();
                if((i000==0&&i>0)||i==idList.size()-1){
                    System.out.println("now we execute from " (i-10000) " to " i);
                    long s = System.currentTimeMillis();
                    ps.executeBatch();
                    System.out.println("execute from " (i-10000) " to " i " took " (System.currentTimeMillis()-s));
                }
            }
        }catch(Exception ex){
            ex.printStackTrace();
        }

By executing this code, I found it was very slow. It took about 60 to 90 seconds to update 10,000 records.

But when I called this code again, I found it only took 1 second to finish the job ,only if these rows were updated once.

I just wonder why the performace is so different. Does it cause by some cache mechanism?

When the performance was poor, I checked the innodb status, and found the update operation was waiting for handler commit. So I understand why it is slow. But why it was much faster when I run it again?

The phenomenon only occurs when I am not using transaction. Once the transaction is added to the code, it always takes about 1 second.

CodePudding user response：

UPDATE must save the old copy of each row. It tries to save it in a log file, but if that fills up, it does it in a slower way, using the actual tables. I feel much safer doing batches of only 100 to 1000.

Also, any secondary indexes are queued up for later storing to disk. (Don't worry, even a system crash will not lose them.)

10,000 is well into "diminishing returns", so there is no advantage in going that high. And as mentioned above, a potential disadvantage.

When an UPDATE does not change any values, it can run faster.

To change an entire column of a table smells like a design flaw. Perhaps a timestamp would let you discover the "status" during any subsequent SELECT? This might completely eliminate the big UPDATE. Or maybe the status can be stored in another table?