What should I do if I need special analyzer in ElasticSearch-CodePudding

In my textual data, I have structures like this:

ст. ст. 40, 131, 132, 176-178, 183, ч. 2 ст. 187, 188, 184, 189, 194 KK

Where KK is the name of a codex, ст. ст., or ст. mean article, ч. mean part. I want Elasticsearch to find a similar string using a regular expression and execute a script to process this string so that I can get tokens like these

40 KK, 131 KK, ..... 194 KK.

How can I get it in Elasticsearch?

CodePudding user response：

I think it is possible to improve this script I wrote. You would have to invoke it at indexing time to get the formatted data.

POST _ingest/pipeline/_simulate
{
  "pipeline": {
    "processors": [
      {
        "script": {
          "description": "Sample handle text",
          "lang": "painless",
          "source": """
            String[] envSplit = ctx['env'].splitOnToken(',');
            ArrayList tags = new ArrayList();
            for(int i = 0; i< envSplit.length; i  ) {
              String value = envSplit[i];
              if(!value.contains('KK')) {
               tags.add(value.replace('ч. 2', '')
                .replace('ст. ', '') 
                  ' KK');
              } else {
                tags.add(envSplit[i]);
              }
            }
            ctx['tags'] = tags;
          """
        }
      }
    ]
  },
  "docs": [
    {
      "_source": {
        "env": "ст. ст. 40, 131, 132, 176-178, 183, ч. 2 ст. 187, 188, 184, 189, 194 KK"
      }
    }
  ]
}