Home > Software engineering >  Filter using string with any possible content in BigQuery
Filter using string with any possible content in BigQuery

Time:10-04

I am a newbie using BigQuery.

I am building a query that I will share with several other people. Each person is responsible for different business units and I want them to be able to easily insert the name of their business units in this query.

I built something like this, and it works fine from what I tested:

DECLARE business_units array<string>;

SET business_units = ["unit_A", "unit_C", "unit_D"];
    
SELECT *
FROM dataset
WHERE bu_name IN UNNEST(business_units)

Problem

I also want to be able to easily change that query in order to search for all possible business units.

Ideally, I just want to change the "SET" line. I tried different things but none of them seem to work. I believe that I need to use metacharacters or regular expression, but I am not being able to find the right combination. I have already looked into the BigQuery documentation, but I am not being able to understand how to do this.

I have tried things like:

SET business_units = ["."];
SET business_units = ["*"];
SET business_units = ["\."];
SET business_units = ["%%"];

When I use any of these, my result return as empty.

Could someone point me in the right direction, please?

CodePudding user response:

IN can not process a list using LIKE or regular expressions, and LIKE or regular expressions can't take arrays as parameters.

The straight forward approach is to just use a JOIN on you un-nested list.

DECLARE business_units array<string>;

SET business_units = ["unit_A", "unit_C", "unit_D"];
    
SELECT
  *
FROM
  dataset
INNER JOIN
  UNNEST(business_units)  AS param_pattern
    ON dataset.buname LIKE param_pattern

If a row matches more than one element in the array, you'll get duplication (each dataset row joined with every pattern that it matches).

How you deal with that is up to you. You might just have SELECT DISTINCT dataset.*, but your question doesn't cover that. (If you're unsure how to proceed with that, open another question once you have this part working.)

CodePudding user response:

There are many options for you here. I will show you those with minimal changes to your original solution

Option #1

DECLARE business_units array<string>;

SET business_units = ["unit_A", "unit_C", "unit_D", "ALL_UNITS"]; 

SELECT *
FROM dataset
WHERE bu_name IN UNNEST(business_units)
OR "ALL_UNITS" IN UNNEST(business_units);   

As you can see here - when you want all units - add "ALL_UNITS" in your SET line

Option #2

DECLARE business_units array<string>;
DECLARE all_units boolean;

SET business_units = ["unit_A", "unit_C", "unit_D"]; 
SET all_units = TRUE;

SELECT *
FROM dataset
WHERE bu_name IN UNNEST(business_units)
OR all_units;     

here - you have one more parameter all_units. When you want to see all units - just set it to TRUE, otherwise to FALSE

  • Related