I wanted to achieve a functionality of SQL in Elasticsearch. Consider the following table:
What I wanted to achieve is aggregation on distinct value for a column, SQL equivalent:
select product,min(distinct price) from test group by product;
Expected output:
The aggregation could be any standard agg like min
, max
, sum
, avg
, count
etc.
What I've already tried is using painless script, I'm able to achieve the required output through painless script but, generating painless script is quite difficult programmatically.
What I'm looking for is either an EQL query that can be generated programmatically or some java plugin that can generate a painless script programmatically.
Edit:
Painless script that I've tried:
{
"aggs": {
"terms": {
"scripted_metric": {
"init_script": "state.rawMap = [:];",
"map_script": "def product = doc['product'].value;state.rawMap.putIfAbsent(product, new ArrayList());def price = doc['price'].value;if(!state.rawMap.get(product).contains(price)){state.rawMap.get(product).add(price);}",
"combine_script": "List outputList=new ArrayList();for (entry in state.rawMap.entrySet()) {def map=[:]; def min=entry.getValue().get(0); for(price in entry.getValue()){if(price<min){min=price;}}map.product=(entry.getKey()); map.min_price=min;outputList.add(map);}return outputList;"
}
}
}
}
CodePudding user response:
No need for Painless scripting! What you want can simple by achieved by the following aggregation query:
{
"size": 0,
"aggs": {
"products": {
"terms": {
"field": "product"
},
"aggs": {
"price_stats": {
"stats": {
"field": "price"
}
}
}
}
}
}
CodePudding user response:
Below query will give you your expected result:
POST test/_search
{
"size": 0,
"aggs": {
"Fruites": {
"terms": {
"field": "product.keyword",
"size": 10
},
"aggs": {
"min_price": {
"min": {
"field": "price"
}
}
}
}
}
}
Sample Result:
{
"took" : 3,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 5,
"relation" : "eq"
},
"max_score" : null,
"hits" : [ ]
},
"aggregations" : {
"Fruites" : {
"doc_count_error_upper_bound" : 0,
"sum_other_doc_count" : 0,
"buckets" : [
{
"key" : "apple",
"doc_count" : 2,
"min_price" : {
"value" : 120.0
}
},
{
"key" : "pears",
"doc_count" : 2,
"min_price" : {
"value" : 150.0
}
},
{
"key" : "banana",
"doc_count" : 1,
"min_price" : {
"value" : 70.0
}
}
]
}
}
}
You can create index mapping like below:
PUT test
{
"mappings": {
"properties": {
"price": {
"type": "integer"
},
"product": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
}
}
}
}
Sample data index like below:
POST _bulk
{"index":{"_index":"test","_id":"1"}}
{"product":"apple","price":"120"}
{"index":{"_index":"test","_id":"2"}}
{"product":"banana","price":"70"}
{"index":{"_index":"test","_id":"3"}}
{"product":"pears","price":"150"}
{"index":{"_index":"test","_id":"4"}}
{"product":"apple","price":"220"}
{"index":{"_index":"test","_id":"5"}}
{"product":"apple","price":"120"}
{"index":{"_index":"test","_id":"5"}}
{"product":"pears","price":"180"}