Home > database >  Mongo Atlas search index for both partial matches and case-insensitive queries
Mongo Atlas search index for both partial matches and case-insensitive queries

Time:08-25

Using Mongo Atlas Search I have already achieved the setup that allows for searching using partially matched queries:

Created this index (without dynamic field mapping), called "search_organizations_name":

{
   "name": {
       "type": "string",
       "analyzer": "lucene.keyword",
       "searchAnalyzer": "lucene.keyword"
   }
}

And leveraged it in code like this (simplified and anonimised):

func (r *Repo) Search(ctx context.Context, query string) ([]Organization, error) {
    querySplit := strings.Split(query, " ")

    // Adding fuzzing.
    for i := range querySplit {
        querySplit[i] = fmt.Sprintf("*%s*", querySplit[i]) 
    }

    // Define pipeline stages.
    searchStage := bson.D{
        {"$search", bson.D{
            {"index, "search_organizations_name"},
            {"wildcard", bson.D{
                {"path", "name"},
                {"query", querySplit},
            }},
        }},
    }

    // Run pipeline.
    cursor, err := r.organizationsCollection().
        Aggregate(ctx, mongo.Pipeline{searchStage})
    if err != nil {// handling err}

    var orgs []Organization
    if err = cursor.All(ctx, &orgs); err != nil {
        return nil, errors.Wrap(err, "parsing organizations to return")
    }

    return orgs, nil
}

This works fine, but it is case sensitive search, which is not ideal. Researching the topic resulted in the following finds:

  • found suggestion to leverage collation, but search indices don't seem to have it as per docs
  • found suggestion to use lucene.standard as it's case insensitive, but it doesn't support partial matches i.e. query "org" wouldn't match to the word "organisation".

I need the search to be able to work with both case-insensitive queries and partial matches.

Am I looking in the wrong direction or asking for too much?

CodePudding user response:

A possible solution in your use case could be using autocomplete with nGram tokenization. It'll allow you to do both partial as well as case-insensitive matches.

The mapping for that can be:

{
  "mappings": {
    "dynamic": false,
    "fields": {
      "name": [
        {
          "type": "string"
        },
        {
          "tokenization": "nGram",
          "type": "autocomplete"
        }
      ]
    }
  }
}

The search query would then look something like this:

{
   "$search":{
      "autocomplete":{
         "query": querySplit,
         "path":"name"
      },
      "index":"search_organizations_name"
   }
}
  • Related