Home > Software engineering >  Join specific number of rows from one table to a single row of another one
Join specific number of rows from one table to a single row of another one

Time:04-14

I'm currently generating random data and I'm not sure how to do this part. I have two tables. I have one entity table and a group table. Both tables have a unique identifiers (guid). My goal is to create a table mappings that matches entities X amount of times to a single group.

For example, let's say I have 10k entities and 100 groups and I would like to create 1k mappings. I would need to create 10 mappings per group (1000/100). I could go ahead and create a while loop and just get 10 random entities, but it is not efficient and I would like to avoid that. I also thought about cross join and then reduce the result to 1k, but I would like each group to have the exact same number of mappings.

Is there any way to do this with a single query ?

Here is an example, say I have this Group table and Entity Table


 --------- 
| GroupId |
 --------- 
| G1      |
| G2      |
| G3      |
 --------- 
 ---------- 
| EntityId |
 ---------- 
| E1       |
| E2       |
| E3       |
| E4       |
| E5       |
| E6       |
| E7       |
| E8       |
| E9       |
| E10      |
 ---------- 

I would like to get 6 mappings (so each group needs 2 mappings). Here is an example of what I could get:

 --------- ---------- 
| GroupId | EntityId |
 --------- ---------- 
| G1      | E2       |
| G1      | E4       |
| G2      | E9       |
| G2      | E10      |
| G3      | E2       |
| G3      | E5       |
 --------- ---------- 


CodePudding user response:

This sounds like a good use for NTILE.

Start by using ROW_NUMBER() on the Groups table to generate sequential "group numbers" (sort by NewID() for a relatively random order):

GroupId | GroupNum
:------ | -------:
G3      |        1
G2      |        2
G1      |        3

Then use NTILE() to distribute the Entity rows into the desired number of groups:

E1  | GroupNum
:-- | -------:
E2  |        1
E10 |        1
E3  |        1
E4  |        1
E1  |        2
E6  |        2
E5  |        2
E7  |        3
E8  |        3
E9  |        3

Finally JOIN the results together on "group number":

SQL:

DECLARE @NumOfGroups INT;

SET @NumOfGroups = ( SELECT COUNT(*) FROM Groups );

SELECT e.[E1], g.GroupId
FROM  (
        SELECT  *, NTILE(@NumOfGroups) OVER(ORDER BY NEWID()) AS GroupNum 
        FROM    Entities 
      ) 
      AS e 
      INNER JOIN 
      (
         SELECT *, ROW_NUMBER() OVER(ORDER BY NEWID()) AS GroupNum
         FROM   Groups
      )  
      AS g ON g.GroupNum = e.GroupNum
ORDER BY e.GroupNum, e.E1
GO

Results:

E1 GroupId
E1 G1
E3 G1
E5 G1
E8 G1
E2 G3
E7 G3
E9 G3
E10 G2
E4 G2
E6 G2

db<>fiddle here

NB: Obviously this approach won't work if there are less Entity rows than Groups, but it doesn't sound like that's an issue here

  • Related