Given a table like the following
elems
['a', 'b', 'c', 'd', 'e']
['v', 'w', 'x', 'y']
I'd like to transform it into something like this:
tuple
['a', 'b', 'c']
['b', 'c', 'd']
['c', 'd', 'e']
['v', 'w', 'x']
['w', 'x', 'y']
I.e., I'd like to get all overlapping 3-tuples.
My current attempt looks as follows:
WITH foo AS (
SELECT ['a', 'b', 'c', 'd', 'e'] AS elems UNION ALL
SELECT ['v', 'w', 'x', 'y']),
single AS (
SELECT * FROM
foo,
UNNEST(elems) elem
),
tuples AS (
SELECT ARRAY_AGG(elem) OVER (ROWS BETWEEN 2 PRECEDING AND 0 FOLLOWING) AS tuple
FROM single
)
SELECT * FROM tuples
WHERE ARRAY_LENGTH(tuple) >= 3
But the problem is, it returns some unwanted rows too, i.e., the ones that are "between" the original rows from the foo
table.
tuple
['a', 'b', 'c']
['b', 'c', 'd']
['c', 'd', 'e']
['d', 'e', 'v'] <--- unwanted
['e', 'v', 'w'] <--- unwanted
['v', 'w', 'x']
['w', 'x', 'y']
Also, is it guaranteed, that the order of rows in single
is correct, or does it only work in my minimal example by chance, because of the low cardinality? (I guess there may be a simple solution without this step in between.)
CodePudding user response:
Consider below approach
select [elems[offset(index - 1)], elems[offset(index)], elems[offset(index 1)]] as tuple
from your_table, unnest([array_length(elems)]) len,
unnest(generate_array(1, len - 2)) index
if applied to sample data in your question - output is
CodePudding user response:
You might consider below query.
Also, is it guaranteed, that the order of rows in single is correct, or does it only work in my minimal example by chance, because of the low cardinality?
afaik, it's not quaranteeded without explicit using WITH OFFSET in the query.
WITH foo AS (
SELECT ['a', 'b', 'c', 'd', 'e'] AS elems UNION ALL
SELECT ['v', 'w', 'x', 'y']),
single AS (
SELECT * FROM
foo,
UNNEST(elems) elem WITH OFFSET
),
tuples AS (
SELECT ARRAY_AGG(elem) OVER (PARTITION BY FORMAT('%t', elems) ORDER BY offset ROWS BETWEEN 2 PRECEDING AND 0 FOLLOWING) AS tuple
FROM single
)
SELECT * FROM tuples
WHERE ARRAY_LENGTH(tuple) >= 3;
CodePudding user response:
Just to give you another idea
create temp function slice(arr ARRAY<string>, pos float64, len float64)
returns array<string> language js as
r"return arr.slice(pos, pos len);";
select slice(elems, index, 3) as tuple
from foo, unnest([array_length(elems)]) len,
unnest(generate_array(0, len - 3)) index
leaving it up to you to refactor above query to the point when it will look something like
select tuple
from foo, unnest(slices(elems, 3)) as tuple