Home > Back-end >  Match any letter in a string
Match any letter in a string

Time:05-16

Table flights:

ID Path
1 NZ:EU
2 JP:CA
SELECT
    path
FROM
    flights
WHERE
    path ILIKE '%' || 'jpca' || '%'

Above query isn't working, it needs to return second row. But it works if I provide for example:

  • jp
  • ca
  • p
  • jp:ca
  • CA

It should also work for:

  • jpca
  • pj
  • cp
  • a:p

Regex answer is also accepted.

CodePudding user response:

It's much easier if you process the column path before the match

Match individual characters

(Updated question.)

Assumptions:

  • All characters are significant, including punctuation.
  • A pattern matches if every character is found in the path.
  • Match case-insensitive.

Lower-case both operands and treat them as arrays.
If there can be duplicate letters, eliminate them for efficiency.

SELECT path
FROM   flights
WHERE  string_to_array(lower(path), null)
    @> string_to_array(lower('JPCA'), null);

Or:

...
WHERE  string_to_array(lower(path), null)  @> '{j,p,c,a}';

Returns all rows where path contains every character in the search patter.

@> is the array "contains" operator.

If the table is big, support it with an GIN index on the expression to make that fast (which is the point of this route):

CREATE INDEX flights_path_array_lower_gin_idx ON flights
USING gin (string_to_array(lower(path), null));

Related, with links to more:

If you don't need index support, a simper check will do:

...
WHERE path ~* ALL (string_to_array('JPCA', null))

~* is the case-insensitive regular expression match operator.

Related:

Substring matching

(Original question.)

Assumptions:

  • The sequence of characters in the search term must be matched.
  • Only ASCII letters are significant
  • Duplicate characters are significant
SELECT path
FROM   flights
WHERE  lower(regexp_replace(path, '[^a-zA-Z]', '', 'g')) ~ lower('JPCA');

This removes all characters except A-Z and a-z and casts the result to lower case before trying a regular expression match. Related:

If your table is big and you need it to be fast, create a trigram expression index:

CREATE INDEX flights_path_expr_idx ON flights
USING gin (lower(regexp_replace(path, '[^a-zA-Z]', '', 'g') gin_trgm_ops);

Requires the additional module pg_trgm installed. See:

Or add a "generated column" to your table and a plain B-tree index on that:

ALTER TABLE flights 
  ADD COLUMN path_ascii text GENERATED ALWAYS AS (lower(regexp_replace(path, '[^a-zA-Z]', '', 'g'))) STORED;

CREATE INDEX flights_path_ascii_trgm_idx ON flights USING gin (path_ascii gin_trgm_ops);

Then:

SELECT path FROM flights WHERE path_ascii ~ 'jpca';

See:

CodePudding user response:

If you don't care for the characters of your term to occur in exactly that sequence, but only want to specify a search that matches each of these characters somewhere, you can use the ALL keyword to match multiple ILIKE patterns at once:

SELECT path
FROM flights
WHERE path ILIKE ALL( ARRAY['%j%', '%p%', '%c%', '%a%'] );

Now to generate that array from just a single string, you can use

SELECT *
FROM flights
WHERE path ILIKE ALL (SELECT '%' || regexp_split_to_table('jpca', '') || '%');

(online demo)

  • Related