dada in ES:
f1 (string) | f2(string) | id (int) |
---|---|---|
apple | abcdef | 1 |
perl | edfadfe | 2 |
perl | aefasdf | 3 |
perl | fedae | 4 |
I want to find f2[2]='f'
, i.e. the data with id=2 and 3
how to find it efficiently?
the length of f2 is fixed, may be up to 336 characters.
my solution is :
{
"query": {
"regexp": {
"f2": ".{2}f.*"
}
}
}
but I don't think using regexp would be efficient, any better choice?
CodePudding user response:
I would recommend to benchmark different options to see which one fits your dataset and usage pattern the best. These would include:
- your regexp
- turning
"f2": "edfadfe"
into"f2_1": "e", "f2_2": "d", ...
in your indexing pipeline - turning
"f2": "edfadfe"
into"f2_spaced": ["e", " d", " f", ...]
in your indexing pipeline or a custom token filter
Both preprocessing options would be very fast on query time but slower on indexing time and will use a LOT of disk space.