I am new to Elastic Search, have been reading a lot about it, but I stumbled at one requirement.
Consider a field of type text in all the documents in a index be "app_data"
Now app_data field always stores one word but that word can be an alphanumeric , numeric, alphabetic.
Requirement -
One type of word stored in app_data looks something like -
app_data:"99IPAB999999FG"
Now if the user wants to search for this app_data they enter something like
"99.IPAB.99999.9"
Another Example - Data in index
app_data:"78IGDB900459JI"
User searches like -"78.IGDB.90045.9"
How should I form a ES query to match the data stored in the index docs if this is feasible?
Considerations -
- I cannot edit the data (using a custom analyser) during insertion to the index as app_data can have simple words like "RED", "RED567".
- Only for the problem mentioned above, I think I have to use a custom analyser along with query DSL.
CodePudding user response:
Assuming that your data is already indexed and cannot be changed, my suggestion is that you apply a pattern before sending the term to the query, removing the ".", 99.IPAB.99999.9 -> 99ipab999999.
With this, you can successfully apply the match_phrase_prefix.
If you cannot apply the pattern to the input, you can do so at search-time "search_analyzer".
The proposal will be to create a parser that generates the token without the ".". In your query add "analyzer":"my_analyzer" that the token will be generated without the "." and the match will work.
New analyzer:
PUT my-index-000001
{
"settings": {
"analysis": {
"char_filter": {
"my_char_filter": {
"type": "pattern_replace",
"pattern": """\.""",
"replacement": ""
}
},
"analyzer": {
"my_analyzer": {
"tokenizer": "standard",
"filter": [
"lowercase"
],
"char_filter": [
"my_char_filter"
]
}
}
}
},
"mappings": {
"properties": {
"app_data": {
"type": "text",
"analyzer": "standard"
}
}
}
}
Query
POST my-index-000001/_bulk
{"index":{}}
{"app_data":"99IPAB999999FG"}
{"index":{}}
{"app_data":"78IGDB900459JI"}
POST my-index-000001/_search
{
"from": 0,
"size": 5,
"query": {
"match_phrase_prefix": {
"app_data": {
"query": "78.IGDB.90045.9",
"analyzer": "my_analyzer"
}
}
}
}
Hits
"hits": [
{
"_index": "my-index-000001",
"_id": "_jNxfoUBQB-6H-4Z6KWM",
"_score": 0.6931471,
"_source": {
"app_data": "78IGDB900459JI"
}
}
]