When using Postgres you can index a string in a database field as a vector using ts_vector. (https://www.postgresql.org/docs/10/datatype-textsearch.html#DATATYPE-TSVECTOR)
Is there a similar concept for ElasticSearch?
CodePudding user response:
It's pretty much what ES does under the hood when you index a string into a text
field.
Let's take the first example from the link you provided: a fat cat sat on a mat and ate a fat rat
With the PG tsvector
type, the following tokens are going to be analyzed and indexed
a and ate cat fat mat on rat sat
If you want to keep positions, you need to specify them, like this:
a:1 fat:2 cat:3 sat:4 on:5 a:6 mat:7 and:8 ate:9 a:10 fat:11 rat:12
Whereas with ES, positions are kept automatically without having to specify them. It is also possible to tell ES to not record them (to save space)
With the ES text
type and the standard
analyzer, the following tokens are going to be analyzed and indexed
a fat cat sat on a mat and ate a fat rat
With the english
analyzer, we get this (stopwords removed, words stemming, etc)
fat cat sat mat at fat rat
ES doesn't store the tokens alphabetically, it doesn't really help either with free-text search. Also it doesn't remove duplicates (although it is possible to do it) because that interferes with the token frequency in the document and in the index, hence the scoring.
Basically, both do index pretty much the same tokens, although ES is a search engine at heart and does it in a much more optimal way. When looking at the tsquery
type, free text searches in ES are also a bit more user-friendly.