BadRequestError when updating Elasticsearch field with Painless script-CodePudding

I'm trying to update a string field in Elasticsearch using Painless script to regex extract from another field. This is being invoked from Python e.g:

es.update_by_query(index='testrss', query=qry, script=scr)

In my example the qry filter returns only 1 record with the following value:

{'body_text': "Purpose prong invitations Homely wine pocketses\nSOURCE: THE NY TIMES, NEW YORK\nReaches stealing jambags Azog pull ask" }

I want to extract THE NY TIMES, NEW YORK into a new field testxy.

To test with a working scr input example: the following works fine:

scr = {
    "lang": "painless",
    "source": "ctx._source.testxy = /[aeiou]/.matcher(ctx._source.body_text).replaceAll('')"
}

..updating testxy to this:

{
...
 '_source': {'testxy': 'Prps prng nvttns Hmly wn pcktss\nSOURCE: THE NY TIMES, NEW YORK\nRchs stlng jmbgs Azg pll sk',
...
}

However regex string extraction is failing:

scr = {
    "lang": "painless",
    "source": "ctx._source.testxy = /SOURCE.*?\n/.matcher(ctx._source.body_text).group(1)"
}

errors with:

---------------------------------------------------------------------------
BadRequestError                           Traceback (most recent call last)
/var/folders/8l/d9m87qtx2yn1bc86txmr30wh0000gn/T/ipykernel_57473/2559631365.py in <module>
----> 1 es.update_by_query(index='testrss', query=qry, script=scr)

/opt/anaconda3/lib/python3.8/site-packages/elasticsearch/_sync/client/utils.py in wrapped(*args, **kwargs)
    412                         pass
    413 
--> 414             return api(*args, **kwargs)
    415 
    416         return wrapped  # type: ignore[return-value]

/opt/anaconda3/lib/python3.8/site-packages/elasticsearch/_sync/client/__init__.py in update_by_query(self, index, allow_no_indices, analyze_wildcard, analyzer, conflicts, default_operator, df, error_trace, expand_wildcards, filter_path, from_, human, ignore_unavailable, lenient, max_docs, pipeline, preference, pretty, query, refresh, request_cache, requests_per_second, routing, script, scroll, scroll_size, search_timeout, search_type, slice, slices, sort, stats, terminate_after, timeout, version, version_type, wait_for_active_shards, wait_for_completion)
   4715         if __body is not None:
   4716             __headers["content-type"] = "application/json"
-> 4717         return self.perform_request(  # type: ignore[return-value]
   4718             "POST", __path, params=__query, headers=__headers, body=__body
   4719         )

/opt/anaconda3/lib/python3.8/site-packages/elasticsearch/_sync/client/_base.py in perform_request(self, method, path, params, headers, body)
    319                     pass
    320 
--> 321             raise HTTP_EXCEPTIONS.get(meta.status, ApiError)(
    322                 message=message, meta=meta, body=resp_body
    323             )

BadRequestError: BadRequestError(400, 'script_exception', 'compile error')

I've also tried:

scr = {
    "lang": "painless",
    "source": "Pattern p = Pattern.compile(\"SOURCE\"); Matcher m = p.matcher(ctx._source.body_text); ctx._source.testxy = m.group(1)"
}

..which also fails. Any idea what I'm doing wrong?

Edit. Error from running this in Dev Tools console:

{
  "error" : {
    "root_cause" : [
      {
        "type" : "script_exception",
        "reason" : "runtime error",
        "script_stack" : [
          "java.base/java.util.regex.Matcher.group(Matcher.java:644)",
          "ctx._source.testxy = /SOURCE/.matcher(ctx._source.body_text).group(1)",
          "                                                            ^---- HERE"
        ],
        "script" : "ctx._source.testxy = /SOURCE/.matcher(ctx._source.body_text).group(1)",
        "lang" : "painless",
        "position" : {
          "offset" : 60,
          "start" : 0,
          "end" : 69
        }
      }
    ],
    "type" : "script_exception",
    "reason" : "runtime error",
    "script_stack" : [
      "java.base/java.util.regex.Matcher.group(Matcher.java:644)",
      "ctx._source.testxy = /SOURCE/.matcher(ctx._source.body_text).group(1)",
      "                                                            ^---- HERE"
    ],
    "script" : "ctx._source.testxy = /SOURCE/.matcher(ctx._source.body_text).group(1)",
    "lang" : "painless",
    "position" : {
      "offset" : 60,
      "start" : 0,
      "end" : 69
    },
    "caused_by" : {
      "type" : "illegal_state_exception",
      "reason" : "No match found"
    }
  },
  "status" : 400
}

Confusing. No match found yet I can remove the target text with /SOURCE.*?\\n/.matcher(ctx._source.body_text).replaceAll('').

CodePudding user response：

Found the solution here. You have to make a call to matcher.find() or matcher.matches() before you can invoke .group(). Who the feck knows why.

scr = {
    "lang": "painless",
    "source": "Matcher m = /(?<=SOURCE:).*?(?=\\n)/.matcher(ctx._source.body_text); boolean b = m.find(); ctx._source.testxy = m.group(0)"
}