In my textual data, I have structures like this:
ст. ст. 40, 131, 132, 176-178, 183, ч. 2 ст. 187, 188, 184, 189, 194 KK
Where KK is the name of a codex, ст. ст., or ст. mean article, ч. mean part. I want Elasticsearch to find a similar string using a regular expression and execute a script to process this string so that I can get tokens like these
40 KK, 131 KK, ..... 194 KK.
How can I get it in Elasticsearch?
CodePudding user response:
I think it is possible to improve this script I wrote. You would have to invoke it at indexing time to get the formatted data.
POST _ingest/pipeline/_simulate
{
"pipeline": {
"processors": [
{
"script": {
"description": "Sample handle text",
"lang": "painless",
"source": """
String[] envSplit = ctx['env'].splitOnToken(',');
ArrayList tags = new ArrayList();
for(int i = 0; i< envSplit.length; i ) {
String value = envSplit[i];
if(!value.contains('KK')) {
tags.add(value.replace('ч. 2', '')
.replace('ст. ', '')
' KK');
} else {
tags.add(envSplit[i]);
}
}
ctx['tags'] = tags;
"""
}
}
]
},
"docs": [
{
"_source": {
"env": "ст. ст. 40, 131, 132, 176-178, 183, ч. 2 ст. 187, 188, 184, 189, 194 KK"
}
}
]
}