I'm still learning to use DSL queries in ElasticSearch. I have documents where one field is a list. I need to count the number of documents that have one element in this field, two elements in this field, etc. For example, here is a document structure:
Document1:
"Volume": [
{
"partition": "s1",
"fieldtype": ["A","B"]
}
]
Document 2:
"Volume": [
{
"partition": "s1",
"fieldtype": ["A"]
}
]
Document 3:
"Volume": [
{
"partition": "s1",
"fieldtype": ["B"]
}
]
I need a way to calculate that there is one document with 2 elements in fieldtype field and 2 documents with one element in fieldtype. If I try to aggregate them like this:
"size":0,
"aggs": {
"name": {
"terms": {
"field": "fieldtype.keyword"
}
}
}
I get counts of elements (number of As and Bs). Without using keyword, I get an error.
CodePudding user response:
I imagined that you work with nested type. Below is my solution:
PUT idx_test
{
"mappings": {
"properties": {
"Volume": {
"type": "nested"
}
}
}
}
POST idx_test/_bulk
{"index":{ "_id": 1}}
{"Volume":[{"partition": "s1","fieldtype": ["A","B"]}]}
{"index":{ "_id": 2}}
{"Volume":[{"partition": "s1","fieldtype": ["A"]}]}
{"index":{ "_id": 3}}
{"Volume":[{"partition": "s1","fieldtype": ["B"]}]}
GET idx_test/_search
{
"size": 0,
"aggs": {
"doc_id": {
"terms": {
"field": "_id",
"size": 10
},
"aggs": {
"volumes": {
"nested": {
"path": "Volume"
},
"aggs": {
"size": {
"sum": {
"script": {
"lang": "painless",
"source": "doc['Volume.fieldtype.keyword'].size()"
}
}
}
}
}
}
}
}
}
Respons:
"aggregations" : {
"doc_id" : {
"doc_count_error_upper_bound" : 0,
"sum_other_doc_count" : 0,
"buckets" : [
{
"key" : "1",
"doc_count" : 1,
"volumes" : {
"doc_count" : 1,
"size" : {
"value" : 2.0
}
}
},
{
"key" : "2",
"doc_count" : 1,
"volumes" : {
"doc_count" : 1,
"size" : {
"value" : 1.0
}
}
},
{
"key" : "3",
"doc_count" : 1,
"volumes" : {
"doc_count" : 1,
"size" : {
"value" : 1.0
}
}
}
]
}
}
CodePudding user response:
@rabbitbr provided a good answer, but I could not understand why we tried to use a nested field. And, I think we need to use terms aggregation instead of sum here. Anyhow, here is a solution without nested :
PUT idx_test
POST idx_test/_bulk
{"index":{ "_id": 1}}
{"Volume":[{"partition": "s1","fieldtype": ["A","B"]}]}
{"index":{ "_id": 2}}
{"Volume":[{"partition": "s1","fieldtype": ["A"]}]}
{"index":{ "_id": 3}}
{"Volume":[{"partition": "s1","fieldtype": ["B"]}]}
GET idx_test/_mapping
GET idx_test/_search
{
"size": 0,
"aggs": {
"size": {
"terms": {
"script": {
"lang": "painless",
"source": "doc['Volume.fieldtype.keyword'].size()"
}
}
}
}
}
Without using keyword, I get an error.
This is normal because without keyword
you are trying to build an aggregation on a field whose type is text
.
Here the response for the query above which is pretty basic query :
{
....
"aggregations": {
"size": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": "1",
"doc_count": 2
},
{
"key": "2",
"doc_count": 1
}
]
}
}
}
As you can see, we have 2 documents with 1 sized array and we have 1 document with 2 sized array.