For example:
- when indexing one document into elasticsearch;
- i want to analyze a field named
description
in the document byuax_url_email
tokenizer/analyzer; - if
description
does have any url, put the url into another field namedurls
array; - finish index this document;
Now i can check whether field urls
is empty to know whether description
has any url.
Is this possible? Or does analyzer only contributes to the inverted index, not other fields?
CodePudding user response:
You can use Ingest Pipeline Script processor with painless script. I hope this will help you.
POST _ingest/pipeline/_simulate?verbose
{
"pipeline": {
"processors": [
{
"script": {
"description": "Extract 'tags' from 'env' field",
"lang": "painless",
"source": """
def m = /(http|ftp|https):\/\/([\w_-] (?:(?:\.[\w_-] ) ))([\w.,@?^=%&:\/~ #-]*[\w@?^=%&\/~ #-])/.matcher(ctx["content"]);
ArrayList urls = new ArrayList();
while(m.find())
{
urls.add(m.group());
}
ctx['urls'] = urls;
""",
"params": {
"delimiter": "-",
"position": 1
}
}
}
]
},
"docs": [
{
"_source": {
"content": "My name is Sagar patel and i visit https://apple.com and https://google.com"
}
}
]
}
Above Pipeline will generate result like below:
{
"docs": [
{
"processor_results": [
{
"processor_type": "script",
"status": "success",
"description": "Extract 'tags' from 'env' field",
"doc": {
"_index": "_index",
"_id": "_id",
"_source": {
"urls": [
"https://apple.com",
"https://google.com"
],
"content": "My name is Sagar patel and i visit https://apple.com and https://google.com"
},
"_ingest": {
"pipeline": "_simulate_pipeline",
"timestamp": "2022-07-13T12:45:00.3655307Z"
}
}
}
]
}
]
}