Home > Software engineering >  How to include synonyms in Elasticsearch using the R package elastic
How to include synonyms in Elasticsearch using the R package elastic

Time:05-30

I would like to include synonyms in Elasticsearch using the R package elastic, preferably at search time only. I can't get this working. Hope someone can help me out. Thanks!

Here I give one example assuming that brain, mind, and smart are synonyms.

My code in R...

library(elastic)
connection <- connect()
#index_delete(connection,"test")
index_create(connection, "test")

properties <-
  '{
   "properties": {
        "sentence": {
            "type":                "text",
            "position_increment_gap": 100
        }
    }
  }'

mapping_create(connection, "test", body = properties)

sentences <- data.frame(sentence = c("This is a brain","This a a mind","This is fun","This is smart"))
document  <- cbind(1,sentences)
colnames(document)[1] <- "document"

docs_bulk(connection,document,"test")

emptyBody <-
  '{
  "query": {
    "match_phrase": {
      "sentence": {
        "query": "this mind",
        "slop": 100
      }
    }
  }
}'

Search(connection,"test",body=emptyBody)

... returns...

"This a mind"

But I want...

"This is a brain" 
"This is a mind"
"This is smart"

Settings?... Based on the documentations of the R package elastic and some general searches, I experimented with the following code block, putting it before the 'properties' code block, but that did not have any effect. :(

settings <- '{
    "analysis": {
      "analyzer": {
          "synonym_analyzer": {
            "tokenizer": "standard",
            "filter": ["lowercase", "synonym_filter"]
          }
        },
        "filter": {
          "synonym_filter": {
            "type": "synonym_graph",
            "synonyms": [
              "brain, mind, smart"
            ]
          }
        }
    }
  }

}'

index_analyze(connection, "test", body = settings)

CodePudding user response:

Are you using the synonyms analyzer in the mapping field?

  "mappings": {
    "properties": {
      "name": {
        "type": "text",
        "search_analyzer": "synonym_analyzer"
      }
    }
  }

CodePudding user response:

I found the solution

I had to create the index with particular settings (instead of using the index_analyze function.

settings <- '
{
  "settings": {
    "index": {
      "analysis": {
        "filter": {
          "my_graph_synonyms": {
            "type": "synonym_graph",
            "synonyms": [
              "mind, brain",
              "brain storm, brainstorm, envisage"
            ]
          }
        },
        "analyzer": {
          "my_index_time_analyzer": {
            "tokenizer": "standard",
            "filter": [
              "lowercase",
              "stemmer"
            ]
          },
          "my_search_time_analyzer": {
            "tokenizer": "standard",
            "filter": [
              "lowercase",
              "stemmer",
              "my_graph_synonyms"
            ]
          }
        }
      }
    }
  },
  "mappings": {
    "properties": {
      "sentence": {
        "type": "text",
        "analyzer": "my_index_time_analyzer",
        "search_analyzer": "my_search_time_analyzer"
      }
    }
  }
}'

index_create(connection, "test", body = settings)

Using the example shared by Alexander Marquardt.

  • Related