Home > OS >  .NET Elastic Search Create NGram Index
.NET Elastic Search Create NGram Index

Time:08-17

I am trying to set up elastic search as a prototype for a project that might use it. The project needs to look through the contents of datasets and make them searchable.

What I have right now is the following:

  • Index documents
  • Search through all fields of the indexed documents for the full text

Missing right now is:

  • Search through all fields of the indexed documents for partial text

That means I can find this sample dataset from my database by searching for e.g. "Sofia" , "sofia", "anderson" or "canada", but not by searching for "canad".

{ "id": 46, "firstName": "Sofia", "lastName": "Anderson", "country": "Canada" }

I am creating my index using the "Elastic.Clients.Elasticsearch" NuGet package. I try to create an Index with a NGram-Tokenizer and apply it to all fields. That seems to be somehow not working.

This is the code that I use to create the Index:

Client.Indices.Create(IndexName, c => c
    .Settings(s => s
        .Analysis(a => a
            .Tokenizer(t => t.Add(TokenizerName, new Tokenizer(new TokenizerDefinitions(new Dictionary<string, ITokenizerDefinition>() { { TokenizerName, ngram } }))))
            .Analyzer(ad => ad
                .Custom(AnalyzerName, ca => ca
                    .Tokenizer(TokenizerName)
                )
            )
        )
    )
    .Mappings(m => m
        .AllField(all => all
            .Enabled()
            .Analyzer(AnalyzerName)
            .SearchAnalyzer(AnalyzerName)
        )
    )
);

with

private string TokenizerName => "my_tokenizer";
private string AnalyzerName => "my_analyzer";

and

var ngram = new NGramTokenizer() { MinGram = 3, MaxGram = 3, TokenChars = new List<TokenChar>() { TokenChar.Letter }, CustomTokenChars = "" };

With this code I get the behaviour described above.

Is there any error in my code? Am I missing something? Do you need further information?

Thanks in advance

Paul

CodePudding user response:

I did not find a way to get this running in .NET.

However what worked for me was to create the index using this API call:

URL:

https://{{elasticUrl}}/{{indexName}}

Body:

{
    "mappings": {
        "properties": {
            "firstName": {
                "type":"text",
                "analyzer":"index_ngram",
                "search_analyzer":"search_ngram"
            },
            "lastName": {
                "type":"text",
                "analyzer":"index_ngram",
                "search_analyzer":"search_ngram"
            },
            "country": {
                "type":"text",
                "analyzer":"index_ngram",
                "search_analyzer":"search_ngram"
            }
        }
    },
    "settings": {
        "index": {
            "max_ngram_diff":50
        },
        "analysis": {
            "filter": {
                "ngram_filter": {
                    "type": "ngram",
                    "min_gram": 2,
                    "max_gram": 25
                }
            },
            "analyzer": {
                "index_ngram": {
                    "type": "custom",
                    "tokenizer": "keyword",
                    "filter": [ "ngram_filter", "lowercase" ]
                },
                "search_ngram": {
                    "type": "custom",
                    "tokenizer": "keyword",
                    "filter": "lowercase"
                }
            }
        }
    }
}

This leads to an NGram with term lengths from 2 to 25 for the fields: firstName, lastName, country.

I hope this helps someone in the future :)

  • Related