Home > Software engineering >  How to find data with specific character in specific position of a string in ElasticSearch
How to find data with specific character in specific position of a string in ElasticSearch

Time:12-25

dada in ES:

f1 (string) f2(string) id (int)
apple abcdef 1
perl edfadfe 2
perl aefasdf 3
perl fedae 4

I want to find f2[2]='f', i.e. the data with id=2 and 3

how to find it efficiently?

the length of f2 is fixed, may be up to 336 characters.

my solution is :

{
    "query": {
        "regexp": {
            "f2": ".{2}f.*" 
        }
    }
}

but I don't think using regexp would be efficient, any better choice?

CodePudding user response:

I would recommend to benchmark different options to see which one fits your dataset and usage pattern the best. These would include:

  • your regexp
  • turning "f2": "edfadfe" into "f2_1": "e", "f2_2": "d", ... in your indexing pipeline
  • turning "f2": "edfadfe" into "f2_spaced": ["e", " d", " f", ...] in your indexing pipeline or a custom token filter

Both preprocessing options would be very fast on query time but slower on indexing time and will use a LOT of disk space.

  • Related