Home > Back-end >  Evenly distributing related records
Evenly distributing related records

Time:03-18

I have a table with over 100k mailboxes and users with permissions.

 --------- --------- 
| Mailbox | Trustee |
 --------- --------- 
| smb1    | mbx1    |
| smb2    | mbx1    |
| smb2    | mbx2    |
| smb2    | mbx3    |
| smb3    | mbx4    |
| smb3    | mbx5    |
| mbx1    | mbx6    |
| mbx7    | mbx4    |
| smb4    | mbx8    |
| smb4    | mbx9    |
| mbx8    | mbx10   |
 --------- --------- 

Need to group Trustee and mailboxes they have access to in Mailbox column. e.g. mbx1, mbx2, and mbx 3 are related by access to smb2, so they go in bucket 1. mbx going into bucket 1 means smb1 also goes into bucket 1 as mbx 1 is a Trustee on that. Then further down because mbx6 has relation to mbx1 it also goes into bucket 1. Hope the others make sense. So note Trustee's can have access so smb (shared Mailboxes) or mbx (mailboxes)

The table I'm selecting from just has Mailbox and Trustee and I want to write into a temp table below.

 --------- --------- -------- 
| Mailbox | Trustee | Bucket |
 --------- --------- -------- 
| smb1    | mbx1    |      1 |
| smb2    | mbx1    |      1 |
| smb2    | mbx2    |      1 |
| smb2    | mbx3    |      1 |
| smb3    | mbx4    |      2 |
| smb3    | mbx5    |      2 |
| mbx1    | mbx6    |      1 |
| mbx7    | mbx4    |      2 |
| smb4    | mbx8    |      3 |
| smb4    | mbx9    |      3 |
| mbx8    | mbx10   |      3 |
 --------- --------- -------- 

I then want to put the bucket counts together to make evenly groups. Idea being I can say e.g. max count 100, so create groups of buckets that count up to around 100 users.

 --------- --------- ------- 
| Groups  | Buckets | Count |
 --------- --------- ------- 
|       1 |       1 |     5 |
|       2 |     2,3 |     6 |
 --------- --------- ------- 

EDIT: I've gotten this far where I can pass in a mailbox and get all the trustees and then the other mailboxes those trustees have access to.

DECLARE @int int = 1;
WITH Buckets_CTE
    (Trustee)
AS (
    SELECT DISTINCT Trustee
    FROM EXOPerms
        WHERE Mailbox = 'smb1'
)
SELECT DISTINCT Mailbox,Trustee
    FROM EXOPerms
    Where Trustee IN (
    SELECT DISTINCT Trustee
    FROM Buckets_CTE)
    ORDER BY Trustee

The DECLARE Int at the top is redundant at the moment just got that there to see if I can implement the bucket feature.

CodePudding user response:

Here is a while loop solution. It just iterate through every row and update the Bucket.

ID is added for looping through the data row by row

To check if a mailbox/trustee exists in another row, check for i.Mailbox in (m.Mailbox, m.Trustee) :

from @mailbox i
     inner join @mailbox m  
     on  i.ID   <> m.ID   -- don't compare the same row
     and (
            i.Mailbox   in (m.Mailbox, m.Trustee) 
         or i.Trustee   in (m.Mailbox, m.Trustee) 
         )

Note that when update the Bucket, it compare with current Bucket and only takes the lower value. This is to resolve case like below where the relationship between earlier rows are not known until later rows.

ID  MailBox  Trustee 
1   a        b      
2   c        d
3   e        f
4   c        f

ID 1, 2, 3 is assign separate Bucket when process sequentially. Only when process ID 4, that it links ID 2 and 3 together


Complete query

declare @mailbox table
(
    ID      int        identity,
    Mailbox varchar(5),
    Trustee varchar(5),
    Bucket  int
)

insert into @mailbox (Mailbox, Trustee) values
( 'smb1',    'mbx1' ),
( 'smb2',    'mbx1' ),
( 'smb2',    'mbx2' ),
( 'smb2',    'mbx3' ),
( 'smb3',    'mbx4' ),
( 'smb3',    'mbx5' ),
( 'mbx1',    'mbx6' ),
( 'mbx7',    'mbx4' ),
( 'smb4',    'mbx8' ),
( 'smb4',    'mbx9' ),
( 'mbx8',    'mbx10');

declare @ID     int,
        @Bucket int = 1    -- start from 1

-- get the minimum ID for start
select  @ID = min(ID) from @mailbox where Bucket is null

while   exists 
        (
            select  *
            from    @mailbox
            where   ID >= @ID
        )
begin
        -- if the mailbox is found in other row with Bucket value
        -- (Bucket is not null)
        if  exists
        (
            select  *
            from    @mailbox i
                    inner join @mailbox m   
                    on  i.ID    <> m.ID
                    and (
                            i.Mailbox   in (m.Mailbox, m.Trustee) 
                        or  i.Trustee   in (m.Mailbox, m.Trustee) 
                        )
            where   i.ID    = @ID
            and     m.Bucket    is not null
        )
        begin
            -- Update Bucket from other row
            update  i
            set     Bucket  = case  when i.Bucket is null
                                    or   i.Bucket > m.Bucket
                                    then m.Bucket
                                    else i.Bucket
                                    end
            from    @mailbox i
                    inner join @mailbox m   
                    on  i.ID    <> m.ID
                    and (
                            i.Mailbox   in (m.Mailbox, m.Trustee) 
                        or  i.Trustee   in (m.Mailbox, m.Trustee) 
                        )
            where   i.ID    = @ID
            and     m.Bucket    is not null

            -- Update other rows that might linked to current ID
            update  m
            set     Bucket  = case  when    i.Bucket > m.Bucket
                                    then    m.Bucket
                                    else    i.Bucket
                                    end
            from    @mailbox i
                    inner join @mailbox m   
                    on  i.ID    <> m.ID
                    and (
                            i.Mailbox   in (m.Mailbox, m.Trustee) 
                        or  i.Trustee   in (m.Mailbox, m.Trustee) 
                        )
            where   i.ID        = @ID
    end
    else
    begin
        -- no other row found with same mailbox. 
        -- Assign Bucket from @Bucket, increment @Bucket
        update  m
        set     Bucket  = @Bucket
        from    @mailbox m
        where   m.ID    = @ID;

        select  @Bucket = @Bucket   1;
    end

    -- Get next ID
    select  @ID = min(ID) from @mailbox where ID > @ID;
end

select  *
from    @mailbox
order by ID
  • Related