Home > Software engineering >  Why bulk update never conflicts with update-by-query requests in Elasticsearch
Why bulk update never conflicts with update-by-query requests in Elasticsearch

Time:10-04

I keep two scripts running, one sending bulk requests to index:

while true; do
    s=$(tr -dc A-Za-z0-9 < /dev/urandom | head -c 10)
    curl -s -X POST 'localhost:9200/test/_bulk' -H 'Content-Type: application/x-ndjson' -d \
    '{ "update": { "_index": "test", "_id": "1" } }
    { "doc": { "name": "update", "foo": "'$s'" } }
    { "update": { "_index": "test", "_id": "2" } }
    { "doc": { "name": "update", "foo": "'$s'" } }
    { "update": { "_index": "test", "_id": "3" } }
    { "doc": { "name": "update", "foo": "'$s'" } }
'
    echo ''
done

And another sending update-by-query requests on these documents (I have to sleep after each request since it may conflict with the previous one if requests sent too frequently):

while true; do
    s=$(tr -dc A-Za-z0-9 < /dev/urandom | head -c 10)
    curl -s -X POST 'localhost:9200/test/_update_by_query' -H 'Content-Type: application/json' -d \
'{
    "query": {
        "match": {
            "name": {
                "query": "update"
            }
        }
    },
    "script": {
        "lang": "painless",
        "source": "ctx._source['"'foo'"'] = '"'$s'"'"
    }
}'
    echo ''
    sleep 1
done

From the output of two scripts, there's no conflict failure in bulk response. All conflicts happened on the update-by-query side.

According to the conflict error message: version conflict, required seqNo [66], primary term [1]. current document has seqNo [67] and primary term [1], seems that the conflict happens when the operation is being copied from primary shard to replica. But bulk also need to do that and increase seqNo, right?

Is there any possibility that update-by-query succeeds but bulk conflicts and fails sometimes?

CodePudding user response:

Your bulk requests always use the index command and so override the document (if any) or create a new document, so there can never be a conflict.

The update-by-query requests are... well, updates, and the conflicts can only happen on this side.

If your update request comes after the a bulk request has overridden an existing document, you get a conflict.

If your bulk request comes after the update request has updated a document, nothing happens because the bulk request will override the changes made by the update request since it uses the index command.

  • Related