Home > Software engineering >  Count the number of matches in an array in BigQuery
Count the number of matches in an array in BigQuery

Time:05-22

How I can count the number of matches in an array? For example, for numbers [1,3] in the array [1,2,3] there will be 2 matches, and for the array [1,2] there will be 1 match. Right now I can only check if [1,3] is in the array or not.

  WITH `arrays` AS (
  SELECT 1 id, [1,2,3] as arr
  UNION ALL
  SELECT 2, [1,2]
  UNION ALL
  SELECT 3, [3]

)
SELECT id, arr, [1,3] as numbers,
  CASE 
   1 IN UNNEST(arr) and 
   3 IN UNNEST(arr) 
  WHEN TRUE THEN 'numbers is in array'
  ELSE 'numbers is not in array'
  END conclusion

FROM `arrays` 

I'm trying to get such result:

enter image description here

CodePudding user response:

Consider below approach

with `arrays` as (
  select 1 id, [1,2,3] as arr  union all
  select 2, [1,2]  union all
  select 3, [3]
)
select *,
  ( select count(*)
    from t.numbers num join t.arr num
    using(num)
  ) check,
  ( select format('number is %sin array',
      if(logical_and(if(num2 is null, false, true)), '', 'not '))
    from t.numbers num1 left join t.arr num2
    on num1 = num2
  ) conclusion
from (
  select id, arr, [1,3] as numbers
  from `arrays` 
) t               

with output

enter image description here

CodePudding user response:

Using a math, following seems to be possible:

  • If union of arr and numbers is same as arr, it will be numbers is in array
  • If union of arr and numbers is greater than arr, elements as much as the increased number is not in the arr.
  • So, numbers_len - (union_len - arr_len) will be check
WITH `arrays` AS (
  SELECT 1 id, [1,2,3] as arr
  UNION ALL
  SELECT 2, [1,2]
  UNION ALL
  SELECT 3, [3]
),
calculated_arrays AS (
  SELECT *, [1,3] as numbers,
         ARRAY_LENGTH(ARRAY(SELECT DISTINCT * FROM UNNEST(arr || [1, 3]))) AS union_len,
         ARRAY_LENGTH(arr) AS arr_len,
         ARRAY_LENGTH([1, 3]) AS numbers_len
    FROM `arrays`
)
SELECT id, arr, numbers,
       numbers_len - union_len   arr_len AS check, 
       IF (union_len = arr_len, 'numbers is in array', 'numbers is not in array') AS conclusion
  FROM calculated_arrays
;

output:

enter image description here

  • Related