I've come upon a weird issue with match_bool_prefix (with one term) query working differently than the prefix query. From what I understand match_bool_prefix should analyze my query and then create a multi-term query with each term and the last should be a prefix query. In my case, my query is part of an email address and ends with an @. Here's my example:
Create Index
curl --location --request PUT 'http://localhost:9200/testindex' \
--header 'Content-Type: application/json' \
--data-raw '{
"settings":{
"analysis": {
"analyzer": {
"default": {
"tokenizer": "uax_url_email",
"filter": ["lowercase"]
}
}
}
},
"mappings": {
"properties": {
"email":{
"type":"text"
}
}
}
}
'
Add data
curl --location --request PUT 'http://localhost:9200/testindex/_doc/1' \
--header 'Content-Type: application/json' \
--data-raw '{
"email":"[email protected]"
}'
Failing Query
curl --location --request POST 'http://localhost:9200/testindex/_search' \
--header 'Content-Type: application/json' \
--data-raw '{
"query": {
"match_bool_prefix": {
"email": "tester@"
}
}
}'
Working Query
curl --location --request POST 'http://localhost:9200/testindex/_search' \
--header 'Content-Type: application/json' \
--data-raw '{
"query": {
"prefix": {
"email": {
"value": "tester@"
}
}
}
}'
Querying with the word 'tester' works with both queries which shows it is using a prefix. Also using another analyzer (keyword, or whitespace) on the match_bool_prefix query works correctly. Which makes me think Elasticsearch isn't doing the right thing. According to the docs, the match_bool_prefix should analyze the query into tokens which in my case would strip off the @ according to this analyze query:
curl --location --request POST 'http://localhost:9200/testindex/_analyze' \
--header 'Content-Type: application/json' \
--data-raw '{
"explain": "false",
"analyzer":"default",
"text" : "tester@"
}
'
results
{
"tokens": [
{
"token": "tester",
"start_offset": 0,
"end_offset": 6,
"type": "<ALPHANUM>",
"position": 0
}
]
}
So this is what the match_bool_prefix query should actually look like but this one works great:
{
"query": {
"bool" : {
"should": [
{ "prefix": { "email": "tester"}}
]
}
}
}
Any help would be appreciated as I'm working on a much larger query but the results of this had me questioning if I was using the match_bool_prefix query correctly.
CodePudding user response:
The match_bool_prefix
query has been created specifically to be used with the search_as_you_type
field (see #35600).
Since you're searching for email prefixes (or whole emails), you can simply use the prefix
query since email addresses are never composed of several terms, especially since you're analyzing it with an uax_url_email
tokenizer. So there's no point in using a match_bool_prefix
query in this case.
CodePudding user response:
match_bool_prefix should analyze my query and then create a multi-term query with each term and the last should be a prefix query
Yes, the match_bool_prefix
analyse the query BUT it uses the analyzer from the queried field’s mapping. keep in mind that the uax_url_email
tokenizer does NOT split emails & URLs, thus there will be no matches for your query.
I suggest either changing the field mapping to keyword
:
PUT stackindex_002
{
"settings":{
"analysis": {
"analyzer": {
"my_analyzer": {
"tokenizer": "uax_url_email",
"filter": ["lowercase"]
}
}
}
},
"mappings": {
"properties": {
"email":{
"type":"keyword"
}
}
}
}
Or adding an explicit analyzer in the query :
GET testindex/_search
{
"query": {
"match_bool_prefix": {
"email": {
"query": "tester@",
"analyzer": "keyword"
}
}
}
}