Home > Software design >  Elasticsearch : using fuzzy search to find abbreviations
Elasticsearch : using fuzzy search to find abbreviations

Time:06-28

I have indexed textual articles which mentions company names, like apple and lemonade, and am trying to search for these companies using their abbreviations like APPL and LMND but fuzzy search is giving other results, for example, searching with LMND gives land which is mentioned in the text but it doesn't output lemonade whichever parameters I tried.

First question Is fuzzy search the suitable solution for such search ?

Second question

what could be a good parameter values ranges to support my problem ?

UPDATE

I have tried synonym filter

{
  "settings": {
    "index": {
      "analysis": {
        "filter": {
          "synonyms_filter": {
            "type": "synonym",
            "synonyms": [
              "apple,APPL",
              "lemonade,LMND"
            ]
          }
        },
        "analyzer": {
          "synonym_analyzer": {
            "tokenizer": "standard",
            "filter": [
              "lowercase",
              "synonyms_filter"
            ]
          }
        }
      }
    }
  },
  "mappings": {
    "properties": {
      "transcript_data": {
        "properties": {
          "words": {
            "type": "nested",
            "properties": {
              "word": {
                "type": "text",
                "search_analyzer":"synonym_analyzer"
              }
            }
          }
        }
      }
    }
  }
}

and for SEARCH I used

{
  "_source": false,
  "query": {
    "nested": {
      "path": "transcript_data.words",
      "query": {
        "match": {
          "transcript_data.words.word": "lmnd"
        }
      }
    }
  }
}

but it's not working

CodePudding user response:

I believe that the best option for you is the use of synonyms, they serve exactly what you need.

I'll leave an example and the link to an article explaining some details.

PUT teste
{
  "settings": {
    "index": {
      "analysis": {
        "filter": {
          "synonyms_filter": {
            "type": "synonym",
            "synonyms": [
              "apple,APPL",
              "lemonade,LMND"
            ]
          }
        },
        "analyzer": {
          "synonym_analyzer": {
            "tokenizer": "standard",
            "filter": [
              "lowercase",
              "synonyms_filter"
            ]
          }
        }
      }
    }
  },
  "mappings": {
    "properties": {
      "transcript_data": {
        "properties": {
          "words": {
            "type": "nested",
            "properties": {
              "word": {
                "type": "text",
                "analyzer":"synonym_analyzer"
              }
            }
          }
        }
      }
    }
  }
}

POST teste/_bulk
{"index":{}}
{"transcript_data": {"words":{"word":"apple"}}}


GET teste/_search
{
  "query": {
    "nested": {
      "path": "transcript_data.words",
      "query": {
        "match": {
          "transcript_data.words.word": "appl"
        }
      }
    }
  }
}
  • Related