I am trying to set up elastic search as a prototype for a project that might use it. The project needs to look through the contents of datasets and make them searchable.
What I have right now is the following:
- Index documents
- Search through all fields of the indexed documents for the full text
Missing right now is:
- Search through all fields of the indexed documents for partial text
That means I can find this sample dataset from my database by searching for e.g. "Sofia" , "sofia", "anderson" or "canada", but not by searching for "canad".
{ "id": 46, "firstName": "Sofia", "lastName": "Anderson", "country": "Canada" }
I am creating my index using the "Elastic.Clients.Elasticsearch" NuGet package. I try to create an Index with a NGram-Tokenizer and apply it to all fields. That seems to be somehow not working.
This is the code that I use to create the Index:
Client.Indices.Create(IndexName, c => c
.Settings(s => s
.Analysis(a => a
.Tokenizer(t => t.Add(TokenizerName, new Tokenizer(new TokenizerDefinitions(new Dictionary<string, ITokenizerDefinition>() { { TokenizerName, ngram } }))))
.Analyzer(ad => ad
.Custom(AnalyzerName, ca => ca
.Tokenizer(TokenizerName)
)
)
)
)
.Mappings(m => m
.AllField(all => all
.Enabled()
.Analyzer(AnalyzerName)
.SearchAnalyzer(AnalyzerName)
)
)
);
with
private string TokenizerName => "my_tokenizer";
private string AnalyzerName => "my_analyzer";
and
var ngram = new NGramTokenizer() { MinGram = 3, MaxGram = 3, TokenChars = new List<TokenChar>() { TokenChar.Letter }, CustomTokenChars = "" };
With this code I get the behaviour described above.
Is there any error in my code? Am I missing something? Do you need further information?
Thanks in advance
Paul
CodePudding user response:
I did not find a way to get this running in .NET.
However what worked for me was to create the index using this API call:
URL:
https://{{elasticUrl}}/{{indexName}}
Body:
{
"mappings": {
"properties": {
"firstName": {
"type":"text",
"analyzer":"index_ngram",
"search_analyzer":"search_ngram"
},
"lastName": {
"type":"text",
"analyzer":"index_ngram",
"search_analyzer":"search_ngram"
},
"country": {
"type":"text",
"analyzer":"index_ngram",
"search_analyzer":"search_ngram"
}
}
},
"settings": {
"index": {
"max_ngram_diff":50
},
"analysis": {
"filter": {
"ngram_filter": {
"type": "ngram",
"min_gram": 2,
"max_gram": 25
}
},
"analyzer": {
"index_ngram": {
"type": "custom",
"tokenizer": "keyword",
"filter": [ "ngram_filter", "lowercase" ]
},
"search_ngram": {
"type": "custom",
"tokenizer": "keyword",
"filter": "lowercase"
}
}
}
}
}
This leads to an NGram with term lengths from 2 to 25 for the fields: firstName
, lastName
, country
.
I hope this helps someone in the future :)