Home > Software engineering >  ElasticSearch - Compile Error on Adding a Field?
ElasticSearch - Compile Error on Adding a Field?

Time:12-10

Using Python, I'm trying to go row-by-row through an Elasticsearch index with 12 billion documents and add a field to each document. The field is named direction and will contain "e" for some values of the field src and "e" for others. For this particular _id, the field should contain an "e".

from elasticsearch import Elasticsearch
es = Elasticsearch(["https://myESserver:9200"],
                   http_auth=('myUsername', 'myPassword'))

query_to_add_direction_field = {
    "script": {
        "inline": "direction=\"e\"",
        "lang": "painless"
    },
    "query": {"constant_score": {
        "filter": {"bool": {"must": [{"match": {"_id": "YKReAoQBk7dLIXMBhYBF"}}]}}}}
}

results = es.update_by_query(index="myIndex-*", body=query_to_add_direction_field)

I'm getting this error:

elasticsearch.BadRequestError: BadRequestError(400, 'script_exception', 'compile error')

I'm new to Elasticsearch. How can I correct my query so that it does not throw an error?

UPDATE:

I updated the code like this:

query_find_id = {
    "size": "1",
    "query": {
        "bool": {
            "filter": {
                "term": {
                    "_id": "YKReAoQBk7dLIXMBhYBF"
                }
            }
        }
    }
}
query_to_add_direction_field = {
    "script": {
        "source": "ctx._source['egress'] = true",
        "lang": "painless"
    },
    "query": {
        "bool": {
            "filter": {
                "term": {
                    "_id": "YKReAoQBk7dLIXMBhYBF"
                }
            }
        }
    }
}

results = es.search(index="traffic-*", body=query_find_id)
results = es.update_by_query(index="traffic-*", body=query_to_add_direction_field)
results_after_update = es.search(index="traffic-*", body=query_find_id)

The code now runs without errors... I think I may have fixed it.

I say I think I may have fixed it because if I run the same code again, I get a version_conflict_engine_exception error on the call to update_by_query... but I think that just means the big 12B-row index is still being updated to match the change I made. Does that sound possibly accurate?

CodePudding user response:

Please try the following query:

{
  "script": {
    "source": "ctx._source.direction = 'e'",
    "lang": "painless"
  },
  "query": {
    "constant_score": {
      "filter": {
        "bool": {
          "must": [
            {
              "match": {
                "_id": "YKReAoQBk7dLIXMBhYBF"
              }
            }
          ]
        }
      }
    }
  }
}

Regarding version_conflict_engine_exception it happens because the version of the document is not the one that the update_by_query operation expects, for example, because other process updated that doc at the same time.

You can add /_update_by_query?conflicts=proceed to workaround the issue.

Read more about conflicts here:

https://www.elastic.co/guide/en/elasticsearch/reference/8.5/docs-update-by-query.html#docs-update-by-query-api-desc

If you think it is a temporal conflict, you can use retry_on_conflict to try again after the conflicts:

retry_on_conflict
(Optional, integer) Specify how many times should the operation be retried when a conflict occurs. Default: 0.
  • Related