Home > front end >  Find largest match string in BigQuery
Find largest match string in BigQuery

Time:12-07

I want to know how to find the longest match string from a given string.

For instance lets say I have this string 1122334455 in table_a and have a table_b with a column containing those following values:

'11'
'112'
'1122'
'112233'
'1122335455'

The desire match of the string 1122334455 is 112233.

Is there any function or query that could be use the desire output?

CodePudding user response:

Match the string using like operator and sort results by length:

select *
from table_b
join table_a on table_a.str like concat('%', table_b.str, '%')
order by char_length(table_b.str) desc
limit 1

CodePudding user response:

You can join using the LIKE operator and then use a Window Function to identify which match for each tablea string is the largest from tableb.

SELECT *
FROM 
  (
    SELECT tablea.yourstring, 
      tableb.yourotherstring, MAX(LENGTH(tableb.yourotherstring)) OVER (PARTITION BY tablea.yourstring) as maxlength
    FROM tablea
      LEFT OUTER JOIN tableb 
        ON tablea.yourstring LIKE concat('%', tableb.yourotherstring, '%')  
  )
WHERE LENGTH(yourotherstring) = maxlength;

I believe Google is introducing the Qualify clause (available in Snowflake and Teradata currently) in a future release and currently available in pre-release, where this becomes a little easier to write:

SELECT *
FROM tablea
  LEFT OUTER JOIN tableb 
    ON tablea.yourstring LIKE concat('%', tableb.yourotherstring, '%')
QUALIFY LENGTH(tableb.yourotherstring) = MAX(LENGTH(tableb.yourotherstring)) OVER (PARTITION BY tablea.yourstring)
    

CodePudding user response:

Consider below (BigQuery)

select col_a as pattern, col_b as match
from table_a, unnest(generate_array(1, length(col_a))) pos
join table_b
on starts_with(substr(col_a, pos), col_b)
where true
qualify 1 = row_number() over(partition by col_a order by length(col_b) desc)    

if applied to dummy data as in your question

with table_a as (
  select '1122334455' col_a union all 
  select '1123334455'
), table_b as (
  select '11' col_b union all
  select '112' union all
  select '1122' union all
  select '112233' union all
  select '1122335455' 
)     

the output is

enter image description here

Note: I purposely NOT using LIKE as it is quite expensive string function, which can be quite expensive on big volumes ...

  • Related