Home > Net >  Joining 2nd Table with Random Row to each record
Joining 2nd Table with Random Row to each record

Time:12-29

I need to join table B to Table A, where Table B's records are randomly assigned, or joined. Most of the queries out there are based off of having a key between them and conditions, where I just want to randomly join records without a key.

I'm not sure where to start, as none of the queries I've found are doing this. I assume a nested join could be helpful for this, but how can I randomly assort the records on join?

**Table A**
| Associate ID| Statement|
|:----:  |:------:|
| 33691| John is |
| 82451| Susie is |
| 25485| Sam is|
| 26582| Lonnie is|
| 52548| Carl is|

**Table B**
| RowID | List|
|:----:  |:------:|
| 1| admirable|
| 2| astounding|
| 3| excellent|
| 4| awesome|
| 5| first class|

The result would be something like this, where items from the list are not looped through in order, but random:

**Result Table**
| Associate ID| Statement| List|
|:----:  |:------:|:------:|
| 33691| John is |astounding|
| 82451| Susie is |first class|
| 25485| Sam is|admirable|
| 26582| Lonnie is|excellent|
| 52548| Carl is|awesome|

These are some of the queries I've tried:

CodePudding user response:

using 2 CTEs we generate a select which generates a row number for each table based on a random order and then join based on that row number.

Using a CTE to get N times the records in B as described here: Repeat Rows N Times According to Column Value (Not included below) Note to get the "N" you'll need to get count from A and B, then divide by eachother and Add 1.

Assuming Even Distribution

With A as(
SELECT *, Row_number() over (order by NewID()) RN
FROM A),
B as (
SELECT *, Row_number () over (order by NewID()) RN
FROM B)
SELECT *
FROM A
INNER JOIN B
 on A.RN = B.RN

Or use (assuming uneven distribution)

SELECT * 
FROM A
CROSS APPLY (SELECT TOP 1 * FROM B ORDER BY NewID()) Z

CodePudding user response:

This method assumes you know in advance which is the smaller table.

First it assigns an ascending row numbering from 1. This does not have to be randomized.

Then for each row in the larger table it uses the modulus operator to randomly calculate a row number in the range to join onto.

WITH Small
     AS (SELECT *,
                ROW_NUMBER() OVER ( ORDER BY (SELECT 0)) AS RN
         FROM   SmallTable),
     Large
     AS (SELECT *,
                1   CRYPT_GEN_RANDOM(3) % (SELECT COUNT(*) FROM SmallTable) AS RND
         FROM   LargeTable
         ORDER  BY RND
         OFFSET 0 ROWS)
SELECT *
FROM   Large
       INNER JOIN Small
               ON Small.RN = Large.RND 

The ORDER BY RND OFFSET 0 ROWS is to get the random numbers materialized in advance.

This will allow a MERGE join on the smaller table. It also avoids an issue that can sometimes happen where the CRYPT_GEN_RANDOM is moved around in the plan and only evaluated once rather than once per row as required.

  • Related