I've been seeing a lot of examples around like this one: postgres full text search like operator
They all specify that you can do a prefix search like this:
SELECT *
FROM eventlogging
WHERE description_tsv @@ to_tsquery('mess:*');
and it will retrieve a word like: "message"
However, what I do not see anywhere is whether or not there is a way to search for different parts of a word, such as a suffix?
The example that I am having trouble with right now is this:
CREATE TABLE IF NOT EXISTS project (
id VARCHAR NOT NULL,
org_name VARCHAR NOT NULL DEFAULT '',
project_name VARCHAR NOT NULL DEFAULT ''
);
insert into project(id, org_name, project_name) values ('123', 'org', 'proj');
insert into project(id, org_name, project_name) values ('456', 'huh', 'org');
insert into project(id, org_name, project_name) values ('789', 'orgs', 'project');
CREATE OR REPLACE FUNCTION get_projects(query_in VARCHAR)
RETURNS TABLE (id VARCHAR, org_name VARCHAR, project_name VARCHAR) AS $$
BEGIN
RETURN QUERY
SELECT * FROM project WHERE (
to_tsvector('simple', coalesce(project.project_name, '')) ||
to_tsvector('simple', coalesce(project.org_name, ''))
) @@ to_tsquery('simple', query_in);
END;
$$ LANGUAGE plpgsql;
The following example returns:
select * from get_projects('org');
id org_name project_name
----------------------------
123 org proj
456 huh org
My question is: why does it not return orgs
? Similarly, if I search for proj
, I only get the project named "proj" but not the one named "project."
Bonus points: how can I get results if I search for a substring? For example, if I search for the string jec
, I would like to get back the project named project
. I'm not really looking for fuzzy searching, but I would say that I am looking for substring searching.
Am I completely wrong to be using to_tsquery
? I also tried plainto_tsquery
and I tried using english
instead of simple
, but several references said to stick with simple.
CodePudding user response:
Full text search is different from substring search. Full text search is about searching whole words, omitting frequent words from indexing, ignoring inflection and the like. PostgreSQL full text search extends that somewhat by allowing prefix searches.
To search for substrings, you have to search with a condition like
WHERE word ~ 'suffix\M'
(This would be a suffix search with the regular expression matching operator ~
.)
To speed up a search like that, create a trigram index:
CREATE EXTENSION IF NOT EXISTS pg_trgm;
CREATE INDEX ON tab USING gin (doc gin_trgm_ops);
CodePudding user response:
So-called prefix searching doesn't really thematically belong in full text searching. I think it was tossed in because, given that tokens would be stored in a btree anyway, adding that "feature" was free. No other types of partial matching are mentioned in the context of FTS because they don't exist.
You discuss the partial matching that does exist with FTS, the :*
notation. But then in your example, you don't actually use it. That is why you don't see it working, because you don't use it. If you do use it, it does work:
select * from get_projects('org:*');
But given your description, it sounds like you don't want FTS in the first place. You want LIKE or regex, perhaps with index support from pg_trgm.
but several references said to stick with simple.
It is hard to know how good the judgement of anonymous references are, but if you only want to use 'simple' than most likely you shouldn't be using FTS in the first place. 'simple' is useful for analyzing or learning or debugging real FTS situations, and can be used as a baseline for building up more complex configurations.