Table flights
:
ID | Path |
---|---|
1 | NZ:EU |
2 | JP:CA |
SELECT
path
FROM
flights
WHERE
path ILIKE '%' || 'jpca' || '%'
Above query isn't working, it needs to return second row. But it works if I provide for example:
- jp
- ca
- p
- jp:ca
- CA
It should also work for:
- jpca
- pj
- cp
- a:p
Regex answer is also accepted.
CodePudding user response:
It's much easier if you process the column path
before the match
Match individual characters
(Updated question.)
Assumptions:
- All characters are significant, including punctuation.
- A pattern matches if every character is found in the
path
. - Match case-insensitive.
Lower-case both operands and treat them as arrays.
If there can be duplicate letters, eliminate them for efficiency.
SELECT path
FROM flights
WHERE string_to_array(lower(path), null)
@> string_to_array(lower('JPCA'), null);
Or:
...
WHERE string_to_array(lower(path), null) @> '{j,p,c,a}';
Returns all rows where path contains every character in the search patter.
@>
is the array "contains" operator.
If the table is big, support it with an GIN index on the expression to make that fast (which is the point of this route):
CREATE INDEX flights_path_array_lower_gin_idx ON flights
USING gin (string_to_array(lower(path), null));
Related, with links to more:
If you don't need index support, a simper check will do:
...
WHERE path ~* ALL (string_to_array('JPCA', null))
~*
is the case-insensitive regular expression match operator.
Related:
- Difference between LIKE and ~ in Postgres
- Check if value exists in Postgres array
- Pattern matching with LIKE, SIMILAR TO or regular expressions in PostgreSQL
- Escape function for regular expression or LIKE patterns
Substring matching
(Original question.)
Assumptions:
- The sequence of characters in the search term must be matched.
- Only ASCII letters are significant
- Duplicate characters are significant
SELECT path
FROM flights
WHERE lower(regexp_replace(path, '[^a-zA-Z]', '', 'g')) ~ lower('JPCA');
This removes all characters except A-Z and a-z and casts the result to lower case before trying a regular expression match. Related:
If your table is big and you need it to be fast, create a trigram expression index:
CREATE INDEX flights_path_expr_idx ON flights
USING gin (lower(regexp_replace(path, '[^a-zA-Z]', '', 'g') gin_trgm_ops);
Requires the additional module pg_trgm
installed. See:
Or add a "generated column" to your table and a plain B-tree index on that:
ALTER TABLE flights
ADD COLUMN path_ascii text GENERATED ALWAYS AS (lower(regexp_replace(path, '[^a-zA-Z]', '', 'g'))) STORED;
CREATE INDEX flights_path_ascii_trgm_idx ON flights USING gin (path_ascii gin_trgm_ops);
Then:
SELECT path FROM flights WHERE path_ascii ~ 'jpca';
See:
CodePudding user response:
If you don't care for the characters of your term to occur in exactly that sequence, but only want to specify a search that matches each of these characters somewhere, you can use the ALL
keyword to match multiple ILIKE
patterns at once:
SELECT path
FROM flights
WHERE path ILIKE ALL( ARRAY['%j%', '%p%', '%c%', '%a%'] );
Now to generate that array from just a single string, you can use
SELECT *
FROM flights
WHERE path ILIKE ALL (SELECT '%' || regexp_split_to_table('jpca', '') || '%');