Home > other >  Excel VBA - N number of Random records per unique group
Excel VBA - N number of Random records per unique group

Time:09-23

I am working on developing a system health check tool to validate that 4 different systems are in sync. To do that, I need to create a sample dataset of random N number of records for every unique key/combination from a main data set everyday. All 4 systems will be checked for records from this sample dataset and any differences will be highlighted using conditional formatting.

I am having trouble figuring out how to extract the sample dataset from the main dataset with the criteria mentioned above.

For Example, I have a report that has 700 rows. Each unique combination of the 6 fields [Client-Contractor-Distribution Center-Service Level-Alert Value-Status] has 100 records. This part will be dynamic. There could be any number of unique combinations and any number of records per combination. Image below for reference. Only the groups are shown here as I cannot paste 700 records in the question. There are 7 unique groups with 100 records each enter image description here

Let's say I want 5 random records for each of the 7 combinations. Essentially, I need a way to get 35 records that are randomly selected, 5 per unique combination. A sample of the desired output is shown below. enter image description here

I have tried using RAND() and RANDBETWEEN() formulas. I do get random records. But the problem is that I cannot ensure that I get 5 records per combination and sometimes duplicate records are returned as well. I am open to any method (VBA/Formulas) to achieve this.

CodePudding user response:

So basically you have 700 possibilities, and you want to get 5 random values out of them, while you are sure that you don't have duplicates?

There are, basically, two ways to do this:

  • You make a resulting collection of random values, you use the random generator to generate numbers from 1 to 700, but before adding them to your collection, you verify if they are already present in your collection. Something like (pseudo-code):

      Dim col_Result as Collection
      Dim finished as Boolean = False;
      Dim r as integer;
    
      while (not finished){
           r = ConvertToInt(Random(700))   1;
           if not(col_Result.Contains(r))
           then col_Result.Add(r);
           finished = (col_Result.Count == 5);
      }
    
  • You make a collection of all numbers from 1 to 700, and 5 times you retrieve a random value out of it, while subtracting that value from the collection. Something like (pseudo-code again):

      Dim col_Values as Collection = (1, 2, ..., 700);
      Dim col_Result as Collection;
    
      Dim r as integer;
    
      for (int i = 0; i < 5; i  ){
          r = ConvertToInt(Random(700));
          col_Result.Add(r);
          col_Values.Subtract(r);
      }
    

When using this last approach, it is vital that subtracting a value from the collection shifts the other values: (1,2,3,4,5).Subtract(2) yields (1,3,4,5).

CodePudding user response:

Please, test the next code. It needs a reference to 'Microsoft Scripting Runtime':

Sub RandomRecPerGroup()
   Dim sh As Worksheet, shRet As Worksheet, lastR As Long, dict As New Scripting.Dictionary
   Dim arr, arrIt, i As Long, j As Long, f As Long, k As Long, count As Long, arrFin
   
   Set sh = ActiveSheet 'use here the sheet you need
   Set shRet = sh.Next  'use here the sheet you need (for testing reason, the next against the active one)
   shRet.Range("G1").EntireColumn.NumberFormat = "@"   'format the column to keep 'Reference number' as text
   lastR = sh.Range("A" & sh.rows.count).End(xlUp).row 'last row
   arr = sh.Range("A2:G" & lastR).Value ' place the range in an array for faster iteration
   ReDim arrFin(1 To 5, 1 To 7): k = 1  'reDim the array to keep each group
   For i = 1 To UBound(arr)             'iterate between the array elements:
        'create a dictionary key if not already existing, with the number of the row as item:
        If Not dict.Exists(arr(i, 1) & arr(i, 2) & arr(i, 3) & arr(i, 4) & arr(i, 5) & arr(i, 6)) Then
            dict.Add arr(i, 1) & arr(i, 2) & arr(i, 3) & arr(i, 4) & arr(i, 5) & arr(i, 6), i
        Else 'adding the number of row, separated by "|"
            dict(arr(i, 1) & arr(i, 2) & arr(i, 3) & arr(i, 4) & arr(i, 5) & arr(i, 6)) = _
                  dict(arr(i, 1) & arr(i, 2) & arr(i, 3) & arr(i, 4) & arr(i, 5) & arr(i, 6)) & "|" & i
        End If
   Next i
   Dim rndNo As Long              'a variable to receive the random number
   For i = 0 To dict.count - 1    'iterate between the dictionary elements:
        arrIt = Split(dict.items(i), "|"): ' split the item by "|" to obtain the same group existing rows
        For k = 1 To 5            'iterate to extract the 5 necessary sample rows of each group
                Randomize         'initialize the random numbers generation
                If UBound(arrIt) = -1 Then Exit For     'for the case of less than 5 rows per group
                rndNo = CLng(UBound(arrIt) * Rnd())     'give a value to the variable keeping the random numbers
                For f = 1 To 7                          'iterating to place in the array all 7 columns value
                    arrFin(k, f) = arr(arrIt(rndNo), f)
                Next f
                arrIt = Filter(arrIt, arrIt(rndNo), False) 'eliminate the element just placed in an array, to avoid doubling
        Next k
        lastR = shRet.Range("A" & sh.rows.count).End(xlUp).row   1 'last empty row of the sheet where the result is returned
        shRet.Range("A" & lastR).Resize(5, 7).Value = arrFin       'drop the array content
   Next i
   MsgBox "Ready..."
End Sub

The code may work without the mentioned reference (using labe binding), but I think it should be good to benefit of intellisense suggestions. If it looks complicated to create it, please (firstly) run the next code which will add it automatically:

Sub addScrRunTimeRef()
  'Add a reference to 'Microsoft Scripting Runtime':
  'In case of error ('Programmatic access to Visual Basic Project not trusted'):
  'Options->Trust Center->Trust Center Settings->Macro Settings->Developer Macro Settings->
  '         check "Trust access to the VBA project object model"
  On Error Resume Next
  Application.VBE.ActiveVBProject.References.AddFromFile "C:\Windows\SysWOW64\scrrun.dll"
  If err.Number = 32813 Then
        err.Clear: On Error GoTo 0
        MsgBox "The reference already exists...": Exit Sub
  Else
        On Error GoTo 0
        MsgBox """Microsoft Scripting Runtime"" reference added successfully..."
  End If
End Sub

Saving the workbook will keep the reference. So, no need to run the code again...

  • Related