It seems like there is a character minimum needed to get results with elasticsearch for a specific property I am searching. It is called 'guid' and has the following configuration:
"guid": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
}
I have a document with the following GUID: 3e49996c-1dd8-4230-8f6f-abe4236a6fc4
The following query returns the document as-expected:
{"match":{"query":"9996c-1dd8*","fields":["guid"]}}
However this query does not:
{"match":{"query":"9996c-1dd*","fields":["guid"]}}
I have the same result with multi_match and query_string queries. I haven't been able to find anything in the documentation about a character minimum, so what is happening here?
CodePudding user response:
Elastic does not require a minimum number of characters. What matters is the generated token.
An exercise that helps to understand is to use _analyzer to see your index tokens.
GET index_001/_analyze
{
"field": "guid",
"text": [
"3e49996c-1dd8-4230-8f6f-abe4236a6fc4"
]
}
You indicate the term 3e49996c-1dd8-4230-8f6f-abe4236a6fc4. Look how the tokens are:
"tokens" : [
{
"token" : "3e49996c",
"start_offset" : 0,
"end_offset" : 8,
"type" : "<ALPHANUM>",
"position" : 0
},
{
"token" : "1dd8",
"start_offset" : 9,
"end_offset" : 13,
"type" : "<ALPHANUM>",
"position" : 1
},
{
"token" : "4230",
"start_offset" : 14,
"end_offset" : 18,
"type" : "<NUM>",
"position" : 2
},
{
"token" : "8f6f",
"start_offset" : 19,
"end_offset" : 23,
"type" : "<ALPHANUM>",
"position" : 3
},
{
"token" : "abe4236a6fc4",
"start_offset" : 24,
"end_offset" : 36,
"type" : "<ALPHANUM>",
"position" : 4
}
]
When you perform the search, the same analyzer that is used in the indexing will be used in the search. When you search for the term "9996c-1dd8*".
GET index_001/_analyze
{
"field": "guid",
"text": [
"9996c-1dd8*"
]
}
The generated tokens are:
{
"tokens" : [
{
"token" : "9996c",
"start_offset" : 0,
"end_offset" : 5,
"type" : "<ALPHANUM>",
"position" : 0
},
{
"token" : "1dd8",
"start_offset" : 6,
"end_offset" : 10,
"type" : "<ALPHANUM>",
"position" : 1
}
]
}
Note that the inverted index will have the token 1dd8 and the term "9996c-1dd8*" generated the token "1dd8" so the match took place.
When you test with the term "9996c-1dd*", no tokens match, so there are no results.
GET index_001/_analyze
{
"field": "guid",
"text": [
"9996c-1dd*"
]
}
Tokens:
{
"tokens" : [
{
"token" : "9996c",
"start_offset" : 0,
"end_offset" : 5,
"type" : "<ALPHANUM>",
"position" : 0
},
{
"token" : "1dd",
"start_offset" : 6,
"end_offset" : 9,
"type" : "<ALPHANUM>",
"position" : 1
}
]
}
Token "1dd" is not equal to "1dd8".