I have a comments
table that over the past year has grown considerably and I'm moving it to ElasticSearch.
The problem is that I need to adapt a query that I currently have in MySQL which returns the total number of comments for each day in the last 7 days for a given post.
Here's the MySQL query that I have now:
SELECT count(*) AS number, DATE(created_at) AS date
FROM `comments`
WHERE `post_id` = ?
GROUP BY `date`
ORDER BY `date` DESC
LIMIT 7
My index looks like this:
{
"mappings": {
"_doc": {
"properties": {
"id": {
"type": "keyword"
},
"post_id": {
"type": "integer"
},
"subject": {
"analyzer": "custom_html_strip",
"type": "text"
},
"body": {
"analyzer": "custom_html_strip",
"type": "text"
},
"created_at": {
"format": "yyyy-MM-dd HH:mm:ss",
"type": "date"
}
}
}
}
}
Is it possible to reproduce that query for ElasticSearch? If so, how would that look like?
My ElasticSearch knowledge is kinda limited, I know that it offers aggregation
, but I don't really know how to put it all together.
CodePudding user response:
Use the following query to get all the comments on a given "post_id" for the last 7 days.
{
"query": {
"bool": {
"must": [
{
"term": {
"id": {
"value": "the_post_id"
}
}
},
/*** only include this clause if you want the recent most 7 days ***/
{
"range": {
"created_at": {
"gte": "now-7d/d",
"lt": "now/d"
}
}
}
]
}
},
"aggs": {
"posts_per_day": {
"date_histogram": {
"field": "created_at",
"calendar_interval": "day",
"order" : {"_key" : "desc"}
}
}
}
}
From the aggregation, pick up the first 7 buckets in your client application.
CodePudding user response:
Elasticsearch supports sql queries as well (though they are limited)
so with a little change you can use something like this
GET _sql
{
"query": """
SELECT count(*) AS number, created_at AS date
FROM comments
WHERE post_id = 123
GROUP BY date
ORDER BY date DESC
LIMIT 7
"""
}
You can see the corresponding query using _sql/translate
, which will return
{
"size" : 0,
"query" : {
"term" : {
"post_id" : {
"value" : 123,
"boost" : 1.0
}
}
},
"_source" : false,
"stored_fields" : "_none_",
"aggregations" : {
"groupby" : {
"composite" : {
"size" : 7,
"sources" : [
{
"31239" : {
"terms" : {
"field" : "created_at",
"missing_bucket" : true,
"order" : "desc"
}
}
}
]
}
}
}
}
That being said, some of the stuff used in the translated query is not needed, so this will be a better native query
{
"query": {
"term": {
"post_id": {
"value": 123
}
}
},
"aggs": {
"unq_dates": {
"terms": {
"field": "created_at",
"size": 7,
"order": {
"_term": "desc"
}
}
}
}
}
CodePudding user response:
First, it is recommended that your time fields be regularly aggregated, so using fields written directly to a 'YYYY-MM-DD' format will improve your performance and reduce your consumption of resources
{
"query":
{
"filter":
[
{
"term":
{
"post_id":
{
"value": 1
}
}
}
]
},
"_source": false,
"aggs":
{
"created_at":
{
"auto_date_histogram":
{
"field": "created_at",
"buckets": 7,
"format": "yyyy-MM-dd"
}
}
}
}