Home > Software design >  Elastic Search - search the data ignoring periods or
Elastic Search - search the data ignoring periods or


The elastic search index has the data having CPFs.

  "name": "A",
  "cpf": "718.881.683-23",

  "name": "B",
  "cpf": "404.833.187-60",

I want to search the data by field cpf as following:

query: 718
output: doc with name "A"
query: 718.881.683-23
output: doc with name "A"

The above is working.

But the following is not working.

query: 71888168323
output: doc with name "A"

Here I want to search the doc by field CPF data but without period and hyphen also.

CodePudding user response:

718.881.683-23 is tokenized to 718 881 683 23 by the standard analyzer. So by default, you will find the document A with 718, 718 881, 718 and 23, but not with 7188 as there is no such token in the field. Probably you want to specify a different analyzer, for example using the edge n-gram tokenizer.

You can create a custom analyzer specifying a filter - for example, a pattern replace like the following (strips everything that is not a digit)

"my_char_filter": {
          "type": "pattern_replace",
          "pattern": "[^\d]",
          "replacement": ""

and a edge n-gram

  "my_tokenizer": {
           "type": "edge_ngram",
           "min_gram": 1,
           "max_gram": 11,
           "token_chars": [

CodePudding user response:

You can add a custom analyzer that will remove all characters that are not digits and only index the digits.

The analyzer looks like this:

PUT test
  "settings": {
    "analysis": {
      "filter": {
        "number_only": {
          "type": "pattern_replace",
          "pattern": "\\D"
      "analyzer": {
        "cpf_analyzer": {
          "type": "custom",
          "tokenizer": "keyword",
          "filter": [
  "mappings": {
    "properties": {
      "cpf": {
        "type": "text",
        "analyzer": "cpf_analyzer"

Then you can index your documents as usual:

POST test/_doc
  "name": "A",
  "cpf": "718.881.683-23"

POST test/_doc
  "name": "B",
  "cpf": "404.833.187-60"

Searching for a prefix like 718 can be done like this:

POST test/_search
  "query": {
    "prefix": {
      "cpf": "718"

Searching for the exact value with non-digit characters can be done like this:

POST test/_search
  "query": {
    "match": {
      "cpf": "718.881.683-23"

And finally, you can also search with numbers only:

POST test/_search
  "query": {
    "match": {
      "cpf": "71888168323"

With the given analyzer, all the above queries will return the document you expect.

If you cannot recreate your index for whatever reason, you can create a sub-field with the right analyzer and update your data in place:

PUT test/_mapping
  "properties": {
    "cpf": {
      "type": "text",
      "fields": {
        "numeric": {
          "type": "text",
          "analyzer": "cpf_analyzer"

And then simply run the following command which will reindex all the data in place and populate the cpf.numeric field:

POST test/_update_by_query

All your searches will then need to be done on the cpf.numeric field instead of cpf directly.

  • Related