Home > front end >  How to search in ElasticSearch the most common word of a single field in a single document?
How to search in ElasticSearch the most common word of a single field in a single document?

Time:10-29

How to search in ElasticSearch the most common word of a single field in a single document? Lets say I have a document that have a field "pdf_content" of type keyword containing:

"good polite nice good polite good"

I would like a return of

{
    word: good,
    occurences: 3
},
{
    word: polite,
    occurences: 2
},
{
    word: nice,
    occurences: 1
},

How is this possible using ElasticSearch 7.15?

I tried this in the Kibana console:

GET /pdf/_search
{
  "aggs": {
    "pdf_contents": {
      "terms": { "field": "pdf_content" }
    }
  }
}

But it only returns me the list of PDFs i have indexed.

CodePudding user response:

Have you ever tried term_vector?:

Basically, you can do:

Mappings:

{
    "mappings": {
        "properties": {
            "pdf_content": {
                "type": "text",
                "term_vector": "with_positions_offsets_payloads"
            }
        }
    }
}

with your sample document:

POST /pdf/_doc/1

{
    "pdf_content": "good polite nice good polite good"
}

Then you can do:

GET /pdf/_termvectors/1

{
  "fields" : ["pdf_content"],
  "offsets" : false,
  "payloads" : false,
  "positions" : false,
  "term_statistics" : false,
  "field_statistics" : false
}

If you want to see other information, you can set them to true. Set all to false give you what you want.

  • Related