I have a table with these arrays:
ids | ids_positions | ids_position_exceptions |
---|---|---|
[A1, A2, A3, A4, A5] | [0, 1, 2, 3, 4] | [2, 3] |
[A1, A2, A3, A4] | [0, 1, 2, 3] | [0] |
ids and ids_positions will always have the same length. Basically ids_positions contains the index of each element of ids
ids_positions will always be ascending ordered, from 0 to a number n.
ids_position_exceptions contains some numbers from ids_positions that should be excluded in the final result (sometimes this array could be null, so no exclusions should be done)
I need this:
id | positions |
---|---|
A1 | 0 |
A2 | 1 |
A3 | 4 |
A4 | 5 |
A5 | 6 |
A1 | 1 |
A2 | 2 |
A3 | 3 |
A4 | 4 |
I have to explode by row the array_positions without the elements in array_exceptions and add the new values to continue the sequence.
For example, in the first row, I have 5 ids. There are two exceptions so I have to remove them and add the same numbers of elements at the end. I removed 2 and 3, and add 5 and 6.
I was thinking in something like: generate an array from 0 to the length of ids_positions ids_position_exceptions with an except if one element of the sequence generated is in array_exception. Then explode this new array into a column.
Idk if BigQuery have an ARRAY_EXCEPT function like SparkSQL but I think it's what I need for this case.
It's a big data problem, so the solution must be cost-effective
Thank you!
CodePudding user response:
Use below
select id, position from your_table t, t.array_positions position
except distinct
select id, exception from your_table t, t.array_exceptions exception
if applied to sample data in your question - output is
Below is answer on updated question
select id, position from (
select ids, array(
(select * from t.ids_positions except distinct select * from t.ids_position_exceptions)
union all
select array_length(ids_positions) i - 1
from unnest(generate_array(1, array_length(ids_position_exceptions))) i
) new_positions
from your_table t
), unnest(ids) id with offset
join unnest(new_positions) position with offset
using(offset)
with output
CodePudding user response:
Using a set operation looks excellent and below is an another possible option without a set operation.
SELECT id, p AS positions
FROM sample, UNNEST(array_positions) p
WHERE p NOT IN UNNEST(array_exceptions)
output:
below query is for your updated question :
SELECT id, position
FROM (
SELECT *,
ARRAY_LENGTH(array_exceptions) AS len_ex,
(SELECT AS STRUCT MAX(p) maxpos, ARRAY_AGG(p) pos
FROM UNNEST(array_positions) p
WHERE p NOT IN UNNEST(array_exceptions)) AS ex
FROM sample
),
UNNEST(ids) id WITH OFFSET JOIN
UNNEST(ARRAY_CONCAT(
ex.pos,
GENERATE_ARRAY(ex.maxpos 1, ex.maxpos len_ex)
)) position WITH OFFSET USING (offset)
;