I want to know how to find the longest match string from a given string.
For instance lets say I have this string 1122334455
in table_a
and have a table_b
with a column containing those following values:
'11'
'112'
'1122'
'112233'
'1122335455'
The desire match of the string 1122334455
is 112233
.
Is there any function or query that could be use the desire output?
CodePudding user response:
Match the string using like operator and sort results by length:
select *
from table_b
join table_a on table_a.str like concat('%', table_b.str, '%')
order by char_length(table_b.str) desc
limit 1
CodePudding user response:
You can join using the LIKE
operator and then use a Window Function to identify which match for each tablea
string is the largest from tableb
.
SELECT *
FROM
(
SELECT tablea.yourstring,
tableb.yourotherstring, MAX(LENGTH(tableb.yourotherstring)) OVER (PARTITION BY tablea.yourstring) as maxlength
FROM tablea
LEFT OUTER JOIN tableb
ON tablea.yourstring LIKE concat('%', tableb.yourotherstring, '%')
)
WHERE LENGTH(yourotherstring) = maxlength;
I believe Google is introducing the Qualify clause (available in Snowflake and Teradata currently) in a future release and currently available in pre-release, where this becomes a little easier to write:
SELECT *
FROM tablea
LEFT OUTER JOIN tableb
ON tablea.yourstring LIKE concat('%', tableb.yourotherstring, '%')
QUALIFY LENGTH(tableb.yourotherstring) = MAX(LENGTH(tableb.yourotherstring)) OVER (PARTITION BY tablea.yourstring)
CodePudding user response:
Consider below (BigQuery)
select col_a as pattern, col_b as match
from table_a, unnest(generate_array(1, length(col_a))) pos
join table_b
on starts_with(substr(col_a, pos), col_b)
where true
qualify 1 = row_number() over(partition by col_a order by length(col_b) desc)
if applied to dummy data as in your question
with table_a as (
select '1122334455' col_a union all
select '1123334455'
), table_b as (
select '11' col_b union all
select '112' union all
select '1122' union all
select '112233' union all
select '1122335455'
)
the output is
Note: I purposely NOT using LIKE as it is quite expensive string function, which can be quite expensive on big volumes ...