ElasticSearch - Search without apostrophe-CodePudding

I'm trying to allow users to search without entering an apostrophe.

E.G type Johns and still bring up results for John's

I've tried multiple things including adding the stemmer filter but with no luck.

I thought I could potentially do something manual such as

GET /_analyze
{
  "char_filter": [{
      "type": "pattern_replace",
      "pattern": "\\s*([a-zA-Z0-9] )\\'s",
      "replacement": "$1 $1s $1's "
  }],
  "tokenizer": "standard",
  "text": "john's dog jumped"
}

And i get the following response

{
  "tokens" : [
    {
      "token" : "john",
      "start_offset" : 0,
      "end_offset" : 4,
      "type" : "<ALPHANUM>",
      "position" : 0
    },
    {
      "token" : "johns",
      "start_offset" : 5,
      "end_offset" : 5,
      "type" : "<ALPHANUM>",
      "position" : 1
    },
    {
      "token" : "john's",
      "start_offset" : 5,
      "end_offset" : 5,
      "type" : "<ALPHANUM>",
      "position" : 2
    },
    {
      "token" : "dog",
      "start_offset" : 7,
      "end_offset" : 10,
      "type" : "<ALPHANUM>",
      "position" : 3
    },
    {
      "token" : "jumped",
      "start_offset" : 11,
      "end_offset" : 17,
      "type" : "<ALPHANUM>",
      "position" : 4
    }
  ]
}

However I still don't get a match when I search for "johns" with out the '

My settings look like:

          "analyzer" : {
            "my_custom_search" : {
              "char_filter" : [ "flexible_plurals" ],
              "tokenizer" : "standard"
            }
          },
          "char_filter" : {
            "flexible_plurals" : {
              "pattern" : """\s*([a-zA-Z0-9] )\'s""",
              "type" : "pattern_replace",
              "replacement" : " $1 $1s $1's "
            }
          }

My mappings like

                "search-terms" : {
                  "type" : "text",
                  "analyzer" : "my_custom_search"
                }

I am using the match query to query the data

CodePudding user response：

You are almost correct, Hope you are using the match query and you have defined your field as text with the custom analyzer, if you use the text field without your custom analyzer which uses your char_filter it will simply use the standard analyzer and won't generate the johns token hence no match.

Complete Working example

Index setting and mapping

{
    "settings": {
        "index": {
            "analysis": {
                "char_filter": {
                    "apostrophe_filter": {
                        "type": "pattern_replace",
                        "pattern": "\\s*([a-zA-Z0-9] )\\'s",
                        "replacement": "$1 $1s $1's "
                    }
                },
                "analyzer": {
                    "custom_analyzer": {
                        "filter": [
                            "lowercase"
                        ],
                        "char_filter": [
                            "apostrophe_filter"
                        ],
                        "tokenizer": "standard"
                    }
                }
            }
        }
    },
    "mappings": {
        "properties": {
            "title": {
                "type": "text",
                "analyzer": "custom_analyzer"
            }
        }
    }
}

Index sample document

{
   "title" : "john's"
   
}

And search for johns

{
    "query": {
        "match": {
            "title": "johns"
        }
    }
}

Search results

 "hits": [
            {
                "_index": "72937076",
                "_type": "_doc",
                "_id": "1",
                "_score": 0.2876821,
                "_source": {
                    "title": "john's" --> note `john's`
                }
            }
        ]