Home > OS >  How Do I Randomly Match Elements in BigQuery
How Do I Randomly Match Elements in BigQuery

Time:08-26

I want to randomly match elements of a table by some condition (1:1 and symmetric pairing, so if user x matches with user y, then user y matches with user x)

WITH tbl AS (
  SELECT 1 AS user, 'A' AS condition
  UNION ALL
  SELECT 2 AS user, 'A' AS condition
  UNION ALL
  SELECT 3 AS user, 'A' AS condition
  UNION ALL
  SELECT 4 AS user, 'B' AS condition
  UNION ALL
  SELECT 5 AS user, 'B' AS condition
  UNION ALL
  SELECT 6 AS user, 'B' AS condition
  UNION ALL
  SELECT 7 AS user, 'B' AS condition
  UNION ALL
  SELECT 8 AS user, 'B' AS condition
  UNION ALL
  SELECT 9 AS user, 'B' AS condition
)
SELECT
  user,
  condition
FROM tbl

How can I generate a query to:

  1. randomly match all users in condition B with another user in condition B
  2. do the same with condition A, but leave one condition A user unmatched because there are an odd number of users with condition A
  3. identifies the unmatched user (1, 2, or 3) with a NULL match

Hypothetical result:

user condition match
1 A 3
2 A NULL
3 A 1
4 B 7
5 B 9
6 B 8
7 B 4
8 B 6
9 B 5

CodePudding user response:

Consider below approach

create temp table temp as 
select _0 as user, condition, _1 as match from (
  select user, condition, div(offset, 2) grp, mod(offset, 2) pos
  from (
    select condition, array_agg(user order by rand()) users
    from your_table
    group by condition
  ), unnest(users) user with offset
)
pivot (any_value(user) for pos in (0,1));

select * from (
  select * from temp union all
  select match, condition, user from temp 
)
where not user is null;            

if applied to sample data in your question - output is

enter image description here

As you can see above solution requires scripting to be involved - so below is slightly refactored version that allows you to achive same result with just one "simple" query

select * from (
  select if(grp < 0, _0, _1) as user, condition, if(grp < 0, _1, _0) as match from (
    select user, condition, grp, mod(offset, 2) pos
    from (
      select condition, array_agg(user order by rand()) users
      from your_table
      group by condition
    ), unnest(users) user with offset, unnest([div(offset   2, 2), -1 * div(offset   2, 2)]) grp
  )
  pivot (any_value(user) for pos in (0,1))
)
where not user is null
  • Related