Home > Back-end >  Elasticsearch date based function scoring boosting the wrong way
Elasticsearch date based function scoring boosting the wrong way

Time:04-08

I would like to boost scores of documents based on how "recent" a document is. I am trying to do this using a function_score. Here is an example of me doing this on a field called updated_at:

{
    "function_score": {
        "boost_mode": "sum",
        "functions": [
            {
                "exp": {
                    "updated_at": {
                        "origin": "now",
                        "scale": "1h",
                        "decay": 0.01,
                    },
                },
                "weight": 1,
            }
        ],
        "query": query
    },
}

I would expect documents close to the datetime now will have a score closer to 1, and documents closer to scale will have a score closer to decay (as described in the docs). Therefore, I'm using the boost_mode sum, to keep the original document scores, and increase depending on how close to now the updated_at value is. (Also, the query score is useful so I would rather add than multiply, which is the default).

To test this scenario, I create a document (A) that returns a query score of about 2. I then duplicate it (B) and modify the new document's updated_at timestamp to be an hour in the past.

In this scenario, I would expect (A) to have a higher score and (B) to have a lower score. However, when I run this scenario, I get the exact opposite. (B) ends up with a score of 3 and (A) ends up with a score of 2.

What am I misunderstanding here to cause this to happen? And how would I modify my function score to do what I would like?

CodePudding user response:

This turned out to be a a timezone issue.

I ended up using the explain API to look at what was contributing to the score. When doing that, I noticed that the origin set to now was actually in a different timezone to the one I was setting in the documents.

I fixed this by manually providing a UTC timestamp in the elasticsearch query rather than using now as the value.

(If there is a better way to do this, please let me know)

  • Related