Home > Software engineering >  How to get all overlapping (ordered) 3-tuples from an array in BigQuery
How to get all overlapping (ordered) 3-tuples from an array in BigQuery

Time:10-28

Given a table like the following

elems
['a', 'b', 'c', 'd', 'e']
['v', 'w', 'x', 'y']

I'd like to transform it into something like this:

tuple
['a', 'b', 'c']
['b', 'c', 'd']
['c', 'd', 'e']
['v', 'w', 'x']
['w', 'x', 'y']

I.e., I'd like to get all overlapping 3-tuples.

My current attempt looks as follows:

WITH foo AS (
  SELECT ['a', 'b', 'c', 'd', 'e'] AS elems UNION ALL
  SELECT ['v', 'w', 'x', 'y']),
single AS (
  SELECT * FROM
    foo,
    UNNEST(elems) elem
),
tuples AS (
  SELECT ARRAY_AGG(elem) OVER (ROWS BETWEEN 2 PRECEDING AND 0 FOLLOWING) AS tuple
  FROM single
)
SELECT * FROM tuples
WHERE ARRAY_LENGTH(tuple) >= 3

But the problem is, it returns some unwanted rows too, i.e., the ones that are "between" the original rows from the foo table.

tuple
['a', 'b', 'c']
['b', 'c', 'd']
['c', 'd', 'e']
['d', 'e', 'v'] <--- unwanted
['e', 'v', 'w'] <--- unwanted
['v', 'w', 'x']
['w', 'x', 'y']

Also, is it guaranteed, that the order of rows in single is correct, or does it only work in my minimal example by chance, because of the low cardinality? (I guess there may be a simple solution without this step in between.)

CodePudding user response:

Consider below approach

select [elems[offset(index - 1)], elems[offset(index)], elems[offset(index   1)]] as tuple
from your_table, unnest([array_length(elems)]) len,
unnest(generate_array(1, len - 2)) index            

if applied to sample data in your question - output is

enter image description here

CodePudding user response:

You might consider below query.

Also, is it guaranteed, that the order of rows in single is correct, or does it only work in my minimal example by chance, because of the low cardinality?

afaik, it's not quaranteeded without explicit using WITH OFFSET in the query.

WITH foo AS (
  SELECT ['a', 'b', 'c', 'd', 'e'] AS elems UNION ALL
  SELECT ['v', 'w', 'x', 'y']),
single AS (
  SELECT * FROM
    foo,
    UNNEST(elems) elem WITH OFFSET
),
tuples AS (
  SELECT ARRAY_AGG(elem) OVER (PARTITION BY FORMAT('%t', elems) ORDER BY offset ROWS BETWEEN 2 PRECEDING AND 0 FOLLOWING) AS tuple
  FROM single
)
SELECT * FROM tuples
WHERE ARRAY_LENGTH(tuple) >= 3;

enter image description here

CodePudding user response:

Just to give you another idea

create temp function slice(arr ARRAY<string>, pos float64, len float64)
returns array<string> language js as 
r"return arr.slice(pos, pos   len);";
select slice(elems, index, 3) as tuple
from foo, unnest([array_length(elems)]) len,
unnest(generate_array(0, len - 3)) index      

leaving it up to you to refactor above query to the point when it will look something like

select tuple
from foo, unnest(slices(elems, 3)) as tuple
  • Related