Home > Mobile >  Translate MySQL aggregation query to ElasticSearch
Translate MySQL aggregation query to ElasticSearch

Time:06-29

I have a comments table that over the past year has grown considerably and I'm moving it to ElasticSearch.

The problem is that I need to adapt a query that I currently have in MySQL which returns the total number of comments for each day in the last 7 days for a given post.

Here's the MySQL query that I have now:

SELECT count(*) AS number, DATE(created_at) AS date 
FROM `comments` 
WHERE `post_id` = ? 
GROUP BY `date` 
ORDER BY `date` DESC 
LIMIT 7

My index looks like this:

{
  "mappings": {
    "_doc": {
      "properties": {
        "id": {
          "type": "keyword"
        },
        "post_id": {
          "type": "integer"
        },
        "subject": {
          "analyzer": "custom_html_strip",
          "type": "text"
        },
        "body": {
          "analyzer": "custom_html_strip",
          "type": "text"
        },
        "created_at": {
          "format": "yyyy-MM-dd HH:mm:ss",
          "type": "date"
        }
      }
    }
  }
}

Is it possible to reproduce that query for ElasticSearch? If so, how would that look like?

My ElasticSearch knowledge is kinda limited, I know that it offers aggregation, but I don't really know how to put it all together.

CodePudding user response:

Use the following query to get all the comments on a given "post_id" for the last 7 days.

{
  "query": {
    "bool": {
      "must": [
        {
          "term": {
            "id": {
              "value": "the_post_id"
            }
          }
        },
        /*** only include this clause if you want the recent most 7 days ***/
        {
          "range": {
            "created_at": {
              "gte": "now-7d/d",
              "lt": "now/d"
            }
          }
        }
      ]
    }
  },
  "aggs": {
    "posts_per_day": {
      "date_histogram": {
        "field": "created_at",
        "calendar_interval": "day",
        "order" : {"_key" : "desc"}
      }
    }
  }
}

From the aggregation, pick up the first 7 buckets in your client application.

CodePudding user response:

Elasticsearch supports sql queries as well (though they are limited)
so with a little change you can use something like this

GET _sql
{
  "query": """
  SELECT count(*) AS number, created_at AS date 
FROM comments
WHERE post_id = 123 
GROUP BY date
ORDER BY date DESC 
LIMIT 7
  """
}

You can see the corresponding query using _sql/translate, which will return

{
  "size" : 0,
  "query" : {
    "term" : {
      "post_id" : {
        "value" : 123,
        "boost" : 1.0
      }
    }
  },
  "_source" : false,
  "stored_fields" : "_none_",
  "aggregations" : {
    "groupby" : {
      "composite" : {
        "size" : 7,
        "sources" : [
          {
            "31239" : {
              "terms" : {
                "field" : "created_at",
                "missing_bucket" : true,
                "order" : "desc"
              }
            }
          }
        ]
      }
    }
  }
}

That being said, some of the stuff used in the translated query is not needed, so this will be a better native query

{
  "query": {
    "term": {
      "post_id": {
        "value": 123
      }
    }
  },
  "aggs": {
    "unq_dates": {
      "terms": {
        "field": "created_at",
        "size": 7,
        "order": {
          "_term": "desc"
        }
      }
    }
  }
}

CodePudding user response:

First, it is recommended that your time fields be regularly aggregated, so using fields written directly to a 'YYYY-MM-DD' format will improve your performance and reduce your consumption of resources

{
    "query":
    {
        "filter":
        [
            {
                "term":
                {
                    "post_id":
                    {
                        "value": 1
                    }
                }
            }
        ]
    },
    "_source": false,
    "aggs":
    {
        "created_at":
        {
            "auto_date_histogram":
            {
                "field": "created_at",
                "buckets": 7,
                "format": "yyyy-MM-dd"
            }
        }
    }
}
  • Related