Home > Blockchain >  Count the frequency of words used in a text field
Count the frequency of words used in a text field

Time:02-19

I'm an Elastic beginner and I have trouble understanding how to find the most popular search terms used by my users.

Each time a user searches for something, Logstash enters a document such as this in Elastic:

{
  "_index" : "user_searches-2022.02.14",
  "_type" : "doc",
  "_id" : "xGQA-H4BVgDEPVU6QZPf",
  "_score" : 1.0,
  "_source" : {
    "message" : """[Large line in apache combined log format]""",
    "@timestamp" : "2022-02-14T11:31:13.395Z",
    "search_string": "hello world",
    "search_terms" : ["hello", "world"]
  }
},

The search_string is extracted from the URL; the search_terms is the search_string splitted (only one of these is needed, but I'm not yet certain which one).

I can't figure out what query can give me the counts of the search terms. I've had some success using "significant_text": {"field: "search_string"}, but it treats the whole string as a term, it doesn't split it into words. _termvectors, on the other hand, appears to only work on a single document, not on the entire index.

CodePudding user response:

I assume you want to count hello and world separately and I assume that type of search_terms is text in your mapping. If so, if you set fielddata to truein your mapping for search_terms field, you can use terms aggregation as below to get the count of each word.

https://www.elastic.co/guide/en/elasticsearch/reference/current/text.html#enable-fielddata-text-fields

{
  "size": 0,
  "aggs": {
    "asd": {
      "terms": {
        "field": "search_terms",
        "size": 10
      }
    }
  }
}

Note that usign fielddata=true for text fields can cause high memory usage.

If search_terms field's type is keyword in the index mapping, you should be able to get the count with the above query without setting fielddata

CodePudding user response:

Here's how I did it in the end, without changing anything else:

GET /user_searches-*/_search
{
  "size": 0,
  "aggs": {
    "search_term_count": {
      "terms": {
        "field": "search_terms.keyword"
      }
    }
  }
}
  • Related