Why can't I search email domain name when using `text` type in Elasticsearch-CodePudding

I have a email field in the document get saved in Elasticsearch index. I am able to search the value before @ but I can't find anything by searching the domain value.

For example, below query give me nothing:

GET transaction-green/_search
{
  "query": {
    "match": {
      "email": "gmail"
    }
  },
  "_source": {
    "includes": [
      "email"
    ]
  }
}

but it returns document if I search [email protected] or just test.

The mapping for this email field is the default text type:

"email" : {
          "type" : "text",
          "fields" : {
            "keyword" : {
              "type" : "keyword",
              "ignore_above" : 256
            }
          }
        }

why does the domain name ignored from searching?

CodePudding user response：

It is happening due to standrad analyzer. As you are using default analyzer, it will analyze your value something like below:

You can use below API for checking analyzer:

POST email/_analyze
{
  "analyzer": "standard", 
  "text": ["[email protected]"]
}

{
  "tokens" : [
    {
      "token" : "test",
      "start_offset" : 0,
      "end_offset" : 4,
      "type" : "<ALPHANUM>",
      "position" : 0
    },
    {
      "token" : "gmail.com",
      "start_offset" : 5,
      "end_offset" : 14,
      "type" : "<ALPHANUM>",
      "position" : 1
    }
  ]
}

You can define your custom analyzer with character filter like below and your query will work:

PUT /email
{
  "settings": {
    "analysis": {
      "analyzer": {
        "my_analyzer": {
          "tokenizer": "standard",
          "char_filter": [
            "my_char_filter"
          ]
        }
      },
      "char_filter": {
        "my_char_filter": {
          "type": "pattern_replace",
          "pattern": "\\.",
          "replacement": " "
        }
      }
    }
  },
  "mappings": {
    "properties": {
      "email":{
        "type": "text",
        "analyzer": "my_analyzer"
      }
    }
  }
}

Now you can analyze value using below analzyer and you can see it will create 3 seperate token for email.

POST email/_analyze
{
  "analyzer": "my_analyzer", 
  "text": ["[email protected]"]
}

{
  "tokens" : [
    {
      "token" : "test",
      "start_offset" : 0,
      "end_offset" : 4,
      "type" : "<ALPHANUM>",
      "position" : 0
    },
    {
      "token" : "gmail",
      "start_offset" : 5,
      "end_offset" : 10,
      "type" : "<ALPHANUM>",
      "position" : 1
    },
    {
      "token" : "com",
      "start_offset" : 11,
      "end_offset" : 14,
      "type" : "<ALPHANUM>",
      "position" : 2
    }
  ]
}