Match any letter in a string-CodePudding

Table flights:

ID	Path
1	NZ:EU
2	JP:CA

SELECT
    path
FROM
    flights
WHERE
    path ILIKE '%' || 'jpca' || '%'

Above query isn't working, it needs to return second row. But it works if I provide for example:

jp
ca
p
jp:ca
CA

It should also work for:

jpca
pj
cp
a:p

Regex answer is also accepted.

CodePudding user response：

It's much easier if you process the column path before the match

Match individual characters

(Updated question.)

Assumptions:

All characters are significant, including punctuation.
A pattern matches if every character is found in the path.
Match case-insensitive.

Lower-case both operands and treat them as arrays.
If there can be duplicate letters, eliminate them for efficiency.

SELECT path
FROM   flights
WHERE  string_to_array(lower(path), null)
    @> string_to_array(lower('JPCA'), null);

Or:

...
WHERE  string_to_array(lower(path), null)  @> '{j,p,c,a}';

Returns all rows where path contains every character in the search patter.

@> is the array "contains" operator.

If the table is big, support it with an GIN index on the expression to make that fast (which is the point of this route):

CREATE INDEX flights_path_array_lower_gin_idx ON flights
USING gin (string_to_array(lower(path), null));

Related, with links to more:

GIN index on array column not used, even after setting `enable_seqscan` to off?

If you don't need index support, a simper check will do:

...
WHERE path ~* ALL (string_to_array('JPCA', null))

~* is the case-insensitive regular expression match operator.

Substring matching

(Original question.)

Assumptions:

The sequence of characters in the search term must be matched.
Only ASCII letters are significant
Duplicate characters are significant

SELECT path
FROM   flights
WHERE  lower(regexp_replace(path, '[^a-zA-Z]', '', 'g')) ~ lower('JPCA');

This removes all characters except A-Z and a-z and casts the result to lower case before trying a regular expression match. Related:

If your table is big and you need it to be fast, create a trigram expression index:

CREATE INDEX flights_path_expr_idx ON flights
USING gin (lower(regexp_replace(path, '[^a-zA-Z]', '', 'g') gin_trgm_ops);

Requires the additional module pg_trgm installed. See:

PostgreSQL LIKE query performance variations

Or add a "generated column" to your table and a plain B-tree index on that:

ALTER TABLE flights 
  ADD COLUMN path_ascii text GENERATED ALWAYS AS (lower(regexp_replace(path, '[^a-zA-Z]', '', 'g'))) STORED;

CREATE INDEX flights_path_ascii_trgm_idx ON flights USING gin (path_ascii gin_trgm_ops);

Then:

SELECT path FROM flights WHERE path_ascii ~ 'jpca';

See:

Computed / calculated / virtual / derived columns in PostgreSQL

CodePudding user response：

If you don't care for the characters of your term to occur in exactly that sequence, but only want to specify a search that matches each of these characters somewhere, you can use the ALL keyword to match multiple ILIKE patterns at once:

SELECT path
FROM flights
WHERE path ILIKE ALL( ARRAY['%j%', '%p%', '%c%', '%a%'] );

Now to generate that array from just a single string, you can use

SELECT *
FROM flights
WHERE path ILIKE ALL (SELECT '%' || regexp_split_to_table('jpca', '') || '%');

^{(online demo)}