I have three logs in ES like
{"@timestamp":"2022-07-19T11:24:16.274073 05:30","log":{"level":200,"logger":"production","message":"BUY_ITEM1","context":{"user_id":31312},"datetime":"2022-07-19T11:24:16.274073 05:30","extra":{"ip":"127.0.0.1"}}}
{"@timestamp":"2022-07-19T11:24:16.274073 05:30","log":{"level":200,"logger":"production","message":"BUY_ITEM2","context":{"user_id":31312},"datetime":"2022-07-19T11:24:16.274073 05:30","extra":{"ip":"127.0.0.1"}}}
{"@timestamp":"2022-07-19T11:24:16.274073 05:30","log":{"level":200,"logger":"production","message":"CLICK_ITEM3","context":{"user_id":31312},"datetime":"2022-07-19T11:24:16.274073 05:30","extra":{"ip":"127.0.0.1"}}}
I can get the users who bought Item1
by querying log.message: "BUY_ITEM1"
in KQL in Kibana.
How can I get user_ids who have both BUY_ITEM1
and BUY_ITEM2
?
CodePudding user response:
Tldr;
Join
query as they exist in SQL
are not really possible in Elasticsearch
, they are (very limited)[https://www.elastic.co/guide/en/elasticsearch/reference/current/joining-queries.html].
You will need to work around this issue.
Work around
You could do an aggregation on user_id
of all the product they bought.
GET /73031860/_search
{
"query": {
"terms": {
"log.message.keyword": [
"BUY_ITEM1",
"BUY_ITEM2"
]
}
},
"size": 0,
"aggs": {
"users": {
"terms": {
"field": "log.context.user_id",
"size": 10
},
"aggs": {
"products": {
"terms": {
"field": "log.message.keyword",
"size": 10
}
}
}
}
}
}
This will give you the following result
{
...
},
"aggregations": {
"users": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": 31312,
"doc_count": 2,
"products": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": "BUY_ITEM1",
"doc_count": 1
},
{
"key": "BUY_ITEM2",
"doc_count": 1
}
]
}
}
]
}
}
}