Home > Back-end >  BigQuery: generate and explode array with exceptions based on another array
BigQuery: generate and explode array with exceptions based on another array

Time:06-22

I have a table with these arrays:

ids ids_positions ids_position_exceptions
[A1, A2, A3, A4, A5] [0, 1, 2, 3, 4] [2, 3]
[A1, A2, A3, A4] [0, 1, 2, 3] [0]

ids and ids_positions will always have the same length. Basically ids_positions contains the index of each element of ids

ids_positions will always be ascending ordered, from 0 to a number n.

ids_position_exceptions contains some numbers from ids_positions that should be excluded in the final result (sometimes this array could be null, so no exclusions should be done)

I need this:

id positions
A1 0
A2 1
A3 4
A4 5
A5 6
A1 1
A2 2
A3 3
A4 4

I have to explode by row the array_positions without the elements in array_exceptions and add the new values to continue the sequence.

For example, in the first row, I have 5 ids. There are two exceptions so I have to remove them and add the same numbers of elements at the end. I removed 2 and 3, and add 5 and 6.

I was thinking in something like: generate an array from 0 to the length of ids_positions ids_position_exceptions with an except if one element of the sequence generated is in array_exception. Then explode this new array into a column.

Idk if BigQuery have an ARRAY_EXCEPT function like SparkSQL but I think it's what I need for this case.

It's a big data problem, so the solution must be cost-effective

Thank you!

CodePudding user response:

Use below

select id, position from your_table t, t.array_positions position
except distinct
select id, exception from your_table t, t.array_exceptions exception     

if applied to sample data in your question - output is

enter image description here

Below is answer on updated question

select id, position from (
  select ids, array(
      (select * from t.ids_positions except distinct select * from t.ids_position_exceptions)
      union all
      select array_length(ids_positions)   i - 1 
      from unnest(generate_array(1, array_length(ids_position_exceptions))) i
    ) new_positions
  from your_table t
), unnest(ids) id with offset
join unnest(new_positions) position with offset
using(offset)     

with output

enter image description here

CodePudding user response:

Using a set operation looks excellent and below is an another possible option without a set operation.

SELECT id, p AS positions
  FROM sample, UNNEST(array_positions) p
 WHERE p NOT IN UNNEST(array_exceptions)

output:

enter image description here

below query is for your updated question :

SELECT id, position
  FROM (
    SELECT *,
           ARRAY_LENGTH(array_exceptions) AS len_ex,
           (SELECT AS STRUCT MAX(p) maxpos, ARRAY_AGG(p) pos 
              FROM UNNEST(array_positions) p 
             WHERE p NOT IN UNNEST(array_exceptions)) AS ex
      FROM sample
  ),
UNNEST(ids) id WITH OFFSET JOIN 
UNNEST(ARRAY_CONCAT(
         ex.pos, 
         GENERATE_ARRAY(ex.maxpos   1, ex.maxpos   len_ex)
      )) position WITH OFFSET USING (offset)
;

enter image description here

  • Related