Home > Software design >  BigQuery - Count how many words in array are equal
BigQuery - Count how many words in array are equal

Time:04-25

I want to count how many similar words I have in a path (which will be split at delimiter /) and return a matching array of integers.

Input data will be something like: enter image description here

I want to add another column, match_count, with an array of integers. For example: enter image description here

To replicate this case, this is the query I'm working with:

CREATE TEMP FUNCTION HOW_MANY_MATCHES_IN_PATH(src_path ARRAY<STRING>, test_path ARRAY<STRING>) RETURNS ARRAY<INTEGER> AS (
    -- WHAT DO I PUT HERE?
);

SELECT
    *,
    HOW_MANY_MATCHES_IN_PATH(src_path, test_path) as dir_path_match_count
FROM (
    SELECT
        ARRAY_AGG(x) AS src_path,
        ARRAY_AGG(y) as test_path
    FROM
        UNNEST([
            'lib/client/core.js',
            'lib/server/core.js'
        ]) AS x, UNNEST([
            'test/server/core.js'
        ]) as y
)

I've tried working with ARRAY and UNNEST in the HOW_MANY_MATCHES_IN_PATH function, but I either end up with an error or an array of 4 items (in this example)

CodePudding user response:

Consider below approach

create temp function how_many_matches_in_path(src_path string, test_path string) returns integer as (
  (select count(distinct src)
  from unnest(split(src_path, '/')) src,
  unnest(split(test_path, '/')) test
  where src = test) 
);
select *, 
  array( select how_many_matches_in_path(src, test)
    from t.src_path src with offset
    join t.test_path test with offset
    using(offset)
  ) dir_path_match_count
from your_table t             

if to apply to sample of Input data in your question

with your_table as (
  select 
    ['lib/client/core.js', 'lib/server/core.js'] src_path,
    ['test/server/core.js', 'test/server/core.js'] test_path
)     

output is

enter image description here

  • Related