Home > database >  Simple regex matching in BigQuery not working
Simple regex matching in BigQuery not working

Time:09-29

I have two tables available in BigQuery:

  • my-project.my-database.what-to-query:
 --------- ----------- 
| id_what | name_what |
 --------- ----------- 
|    1    |   C       |
 --------- ----------- 
|    2    |   Foo     |
 --------- ----------- 
  • my-project.my-database.where-to-query:
 ---------- ---------------------- 
| id_where |      name_where      |
 ---------- ---------------------- 
|    A     | C   is important     |
 ---------- ---------------------- 
|    B     | Foo Fighters is nice |
 ---------- ---------------------- 
|    C     | I know C# and C      |
 ---------- ---------------------- 
|    D     | Football is cool     |
 ---------- ---------------------- 
|    E     | Don't have anything  |
 ---------- ---------------------- 

I would like to use name_what as a REGEX search keyword, to obtain all the matches in name_where, while keeping all the columns. The result should look like:

 --------- ----------- ---------- ---------------------- 
| id_what | name_what | id_where |      name_where      |
 --------- ----------- ---------- ---------------------- 
|    1    |   C       |    A     | C   is important     |
 --------- ----------- ---------- ---------------------- 
|    1    |   C       |    C     | I know C# and C      |
 --------- ----------- ---------- ---------------------- 
|    2    |   Foo     |    B     | Foo Fighters is nice |
 --------- ----------- ---------- ---------------------- 
|    2    |   Foo     |    D     | Football is cool     |
 --------- ----------- ---------- ---------------------- 

Notice how C should be escaped, something like:

SELECT *
FROM `my-project.my-database.where-to-query`
WHERE REGEXP_CONTAINS(name, r"C\ \ ")

BUT the thing is that column name_what could keep several OTHER strings (i.e., IRL, both tables contain hundreds of thousands of rows, this is only a toy sample), which would contain OTHER RegEx special characters. In Python for instance, you have enter image description here

CodePudding user response:

Pratik's solution is the way to go. Meantime, consider also below option

SELECT c.id_what, c.name_what, s.id_where, s.name_where
FROM `my-project.my-database.what-to-query` c, `my-project.my-database.where-to-query` s
WHERE s.name_where LIKE '%' || c.name_what || '%'       

with output

enter image description here

  • Related