Home > OS >  Why can't I search email domain name when using `text` type in Elasticsearch
Why can't I search email domain name when using `text` type in Elasticsearch

Time:06-16

I have a email field in the document get saved in Elasticsearch index. I am able to search the value before @ but I can't find anything by searching the domain value.

For example, below query give me nothing:

GET transaction-green/_search
{
  "query": {
    "match": {
      "email": "gmail"
    }
  },
  "_source": {
    "includes": [
      "email"
    ]
  }
}

but it returns document if I search [email protected] or just test.

The mapping for this email field is the default text type:

"email" : {
          "type" : "text",
          "fields" : {
            "keyword" : {
              "type" : "keyword",
              "ignore_above" : 256
            }
          }
        }

why does the domain name ignored from searching?

CodePudding user response:

It is happening due to standrad analyzer. As you are using default analyzer, it will analyze your value something like below:

You can use below API for checking analyzer:

POST email/_analyze
{
  "analyzer": "standard", 
  "text": ["[email protected]"]
}
{
  "tokens" : [
    {
      "token" : "test",
      "start_offset" : 0,
      "end_offset" : 4,
      "type" : "<ALPHANUM>",
      "position" : 0
    },
    {
      "token" : "gmail.com",
      "start_offset" : 5,
      "end_offset" : 14,
      "type" : "<ALPHANUM>",
      "position" : 1
    }
  ]
}

You can define your custom analyzer with character filter like below and your query will work:

PUT /email
{
  "settings": {
    "analysis": {
      "analyzer": {
        "my_analyzer": {
          "tokenizer": "standard",
          "char_filter": [
            "my_char_filter"
          ]
        }
      },
      "char_filter": {
        "my_char_filter": {
          "type": "pattern_replace",
          "pattern": "\\.",
          "replacement": " "
        }
      }
    }
  },
  "mappings": {
    "properties": {
      "email":{
        "type": "text",
        "analyzer": "my_analyzer"
      }
    }
  }
}

Now you can analyze value using below analzyer and you can see it will create 3 seperate token for email.

POST email/_analyze
{
  "analyzer": "my_analyzer", 
  "text": ["[email protected]"]
}
{
  "tokens" : [
    {
      "token" : "test",
      "start_offset" : 0,
      "end_offset" : 4,
      "type" : "<ALPHANUM>",
      "position" : 0
    },
    {
      "token" : "gmail",
      "start_offset" : 5,
      "end_offset" : 10,
      "type" : "<ALPHANUM>",
      "position" : 1
    },
    {
      "token" : "com",
      "start_offset" : 11,
      "end_offset" : 14,
      "type" : "<ALPHANUM>",
      "position" : 2
    }
  ]
}
  • Related