I have a email
field in the document get saved in Elasticsearch index. I am able to search the value before @
but I can't find anything by searching the domain value.
For example, below query give me nothing:
GET transaction-green/_search
{
"query": {
"match": {
"email": "gmail"
}
},
"_source": {
"includes": [
"email"
]
}
}
but it returns document if I search [email protected]
or just test
.
The mapping for this email
field is the default text
type:
"email" : {
"type" : "text",
"fields" : {
"keyword" : {
"type" : "keyword",
"ignore_above" : 256
}
}
}
why does the domain name ignored from searching?
CodePudding user response:
It is happening due to standrad
analyzer. As you are using default analyzer, it will analyze your value something like below:
You can use below API for checking analyzer:
POST email/_analyze
{
"analyzer": "standard",
"text": ["[email protected]"]
}
{
"tokens" : [
{
"token" : "test",
"start_offset" : 0,
"end_offset" : 4,
"type" : "<ALPHANUM>",
"position" : 0
},
{
"token" : "gmail.com",
"start_offset" : 5,
"end_offset" : 14,
"type" : "<ALPHANUM>",
"position" : 1
}
]
}
You can define your custom analyzer with character filter like below and your query will work:
PUT /email
{
"settings": {
"analysis": {
"analyzer": {
"my_analyzer": {
"tokenizer": "standard",
"char_filter": [
"my_char_filter"
]
}
},
"char_filter": {
"my_char_filter": {
"type": "pattern_replace",
"pattern": "\\.",
"replacement": " "
}
}
}
},
"mappings": {
"properties": {
"email":{
"type": "text",
"analyzer": "my_analyzer"
}
}
}
}
Now you can analyze value using below analzyer and you can see it will create 3 seperate token for email.
POST email/_analyze
{
"analyzer": "my_analyzer",
"text": ["[email protected]"]
}
{
"tokens" : [
{
"token" : "test",
"start_offset" : 0,
"end_offset" : 4,
"type" : "<ALPHANUM>",
"position" : 0
},
{
"token" : "gmail",
"start_offset" : 5,
"end_offset" : 10,
"type" : "<ALPHANUM>",
"position" : 1
},
{
"token" : "com",
"start_offset" : 11,
"end_offset" : 14,
"type" : "<ALPHANUM>",
"position" : 2
}
]
}