Home > OS >  Get and manipulate data from elasticsearch
Get and manipulate data from elasticsearch

Time:07-19

I am new to elasticsearch so I will need some help. Unfortunately, I didnt found the answer in other topics here on SO.

I have some .net core application which I inherited and now there is a need to implement some changes. I already have a method of getting data from elasticsearch, but after getting them, I am not sure how to change it and use it in application.

To be precise, I need to parse first and last name and to remove special characters, specific serbian latin letters like "šđžčć" etc... I already have a method for this parsing written but not sure how to call it...

So, my question is can I and how can I do this?

What I have now is the following:

var result = await _elasticClient.SearchAsync<CachedUserEntity>(
                s =>
                    s.Index(_aliasName)
                       .Query(q => andQuery));

CachedUserEntity, among others, contains property about FirstName and LastName.

Inside results.Documents, I am getting the data about FirstName and LastName from elasticsearch, but I am not sure how to access it in order to update it via aformentioned NameParser ...

Sorry if the question is too easy, not to say stupid :)

CodePudding user response:

I wont use updateByQuery here, for some reasons. I would scroll on documents (i use matchAll on my exemple, you obviously need to replace it with your query), or, if you dont know how to identify documents to update, only update usefull documents in UpdateManyWithIndex/UpdateManyPartial function.

For performance, we have to update severals documents at once, so we use bulk/updateMany function.

You can use both solution, the classic update, or the second (partial update) with an object containing the targeteds fields. On server sides, both solutions will have the same cost / performance.

var searchResponse = Client.Search<CachedUserEntity>(s => s
    .Query(q => q
        MatchAll()
    )
    .Scroll("10s") 
);

while (searchResponse.Documents.Any()) 
{
    List<CachedUserEntity> NewSearchResponse = RemoveChar(searchResponse); 
    UpdateManyWithIndex<CachedUserEntity>(NewSearchResponse, _aliasName);
    searchResponse = Client.Scroll<Project>("2h", searchResponse.ScrollId);
}





public void UpdateManyWithIndex<C>(List<C> obj, string index) where C : class {
            var bulkResponse = Client.Bulk(b => b
                    .Index(index).Refresh(Elasticsearch.Net.Refresh.WaitFor) // explicitly provide index name
                    .UpdateMany<C>(obj, (bu, d) => bu.Doc(d)));
        }
        
    

Or, using partial update object

Note: in this case Indix is already set on my client (add .index if needed)

var searchResponse = Client.Search<CachedUserEntity>(s => s
    .Query(q => q
        MatchAll()
    )
    .Scroll("2h") 
);

while (searchResponse.Documents.Any()) 
{
    List<object> listPocoPartialObj = GetPocoPartialObjList(searchResponse); 
    UpdateManyPartial(listPocoPartialObj);
    searchResponse = Client.Scroll<Project>("2h", searchResponse.ScrollId);
}


private List<object> GetPocoPartialObjList(List<CachedUserEntity> cachedList) {
    List<object> listPoco = new List<object>();
    //note if you dont have cachedList.Id, take a look at result.source, comments if needed
    foreach (var eltCached in cachedList) {
            listPoco.Add( new object() { Id = cachedList.Id, FirstName = YOURFIELDWITHOUTSPECIALCHAR, LastName =  YOURSECONDFIELDWITHOUTSPECIALCHAR});
    }
    return listPoco;
}


public bool UpdateManyPartial(List<object> partialObj) 
        {
            var bulkResponse = Client.Bulk(b => b
                .Refresh(Elasticsearch.Net.Refresh.WaitFor)
                .UpdateMany(partialObj, (bu, d) => bu.Doc(d))
            );

            if (!bulkResponse.IsValid)
            {
                GetErrorMsgs(bulkResponse);
            }
            return (bulkResponse?.IsValid == true);
        }
  • Related