In spring data elasticsearch, how can I project or add a calculated field on all results?-CodePudding

I am using spring-data-elasticsearch (latest version) along with a docker instance of elasticsearch (latest version), and I want to calculate a field on all results that are returned from my repository after a query. I do not want this information in the repository, because it is sometimes query dependent, and sometimes environment dependent. For example, if we perform a query, I want to generate a URL that includes the query terms as query parameters in the URL that I want to enrich the result with. There are some other cases, too. I have tried creating a spring data custom reading converter that accepts the whole document object. I can see that it is recognized when the application starts, but it is never invoked. How can I either project a field with a custom value, or enrich the returned documents with a contextually calculated value?

CodePudding user response：

I first thought about AfterConvertCallback as well like Chin commented, but in a callback you have no context of the query that was run to get the entity, so you cannot use things like query terms to build something.

I would add the property - let's name it url of type String here - to the entity and mark it with the org.springframework.data.annotation.Transient annotation to prevent it from being stored.

Then in the method where you do the search, either using ElasticsearchOperations or a repository, postprocess the returned entites (code not tested, just written down here):

SearchHits<Entity> searchHits = repository.findByFoo(String fooValue);
searchHits.getSearchHits().forEach(searchHit -> {
    searchHit.getContent().setUrl(someValueDerivedFromEnvironemtAndQuery);
});

After that proceed using the SearchHits.

CodePudding user response：

I like a hybrid approach of combining the answers from both @ChinHuang and @PJMeisch. Both answers have their applicability, depending on the context or situation. I like Chin Huang's suggestion for instance-based information, where you would need things like configuration values. I also agree that PJ Meisch is correct in his concern that this does not give you access to the immediate query, so I like his idea of intercepting/mapping the values when the data is being returned from the data store. I appreciate the great information from both people, because this combination of both approaches is a solution that I am happy with.

I prefer to use a repository interface wherever possible, because many people incorrectly mix business logic into their repositories. If I want custom implementation, then I am forced to really think about it, because I have to create an "Impl" class to achieve it. This is not the gravest of errors, but I always accompany a repository with a business service that is responsible for any data grooming, or any programmatic action that is not strictly retrieval, or persistence, of data.

Here is the part of my module configuration where I create the custom AfterConvertCallback. I set the base URL in the onAfterConvert method:

@Bean
AfterConvertCallback<BookInfo> bookInfoAfterConvertCallback() {
    return new BookInfoAfterConvertCallback(documentUrl);
}

static class BookInfoAfterConvertCallback implements AfterConvertCallback<BookInfo> {

    private final String documentUrl;

    public BookInfoAfterConvertCallback(String documentUrl) {
        this.documentUrl = documentUrl;
    }

    @Override
    public BookInfo onAfterConvert(final BookInfo entity, final Document document, final IndexCoordinates indexCoordinates) {
        entity.setUrl(String.format("%s?id=%d", documentUrl, entity.getId()));
        return entity;
    }
}

In the data service that invokes the repository query, I wrote a pair of functions that creates the query param portion of the URL so that I can append it in any applicable method that uses the auto-wired repository instance:

/**
 * Given a term, encode it so that it can be used as a query parameter in a URL
 */
private static final Function<String, String> encodeTerm = term -> {
    try {
        return URLEncoder.encode(term, StandardCharsets.UTF_8.name());
    } catch (UnsupportedEncodingException e) {
        log.warn("Could not encode search term for document URL", e);
        return null;
    }
};

/**
 * Given a list of search terms, transform them into encoded URL query parameters and append
 * them to the given URL.
 */
private static final BiFunction<List<String>, String, String> addEncodedUrlQueryParams = (searchTerms, url) ->
        searchTerms.stream()
                .map(term -> String.format("term=%s", encodeTerm.apply(term)))
                .filter(Objects::nonNull)
                .collect(Collectors.joining("&", url   "&", ""));

This absolutely can all be done in a repository instance, or in its enclosing service. But, when you want to intercept all data that is retrieved, and do something with it that is not specific to the query, then the callback is a great option because it does not incur the maintenance cost of needing to introduce it in every data layer method where it should apply. At query time, when you need to reference information that is only available in the query, it is clearly a matter of introducing this type of code into your data layer (service or repo) methods.

I am adding this as an answer because, even though I didn't realize it at the time that I posted my question, this is two concerns that are separate enough to warrant both approaches. I do not want to claim credit for this answer, so I will not select it as the answer unless you both comment on this, and tell me that you want me to do that.