I am working on developing a system health check tool to validate that 4 different systems are in sync. To do that, I need to create a sample dataset of random N number of records for every unique key/combination from a main data set everyday. All 4 systems will be checked for records from this sample dataset and any differences will be highlighted using conditional formatting.
I am having trouble figuring out how to extract the sample dataset from the main dataset with the criteria mentioned above.
For Example, I have a report that has 700 rows. Each unique combination of the 6 fields [Client-Contractor-Distribution Center-Service Level-Alert Value-Status] has 100 records. This part will be dynamic. There could be any number of unique combinations and any number of records per combination. Image below for reference. Only the groups are shown here as I cannot paste 700 records in the question. There are 7 unique groups with 100 records each
Let's say I want 5 random records for each of the 7 combinations. Essentially, I need a way to get 35 records that are randomly selected, 5 per unique combination. A sample of the desired output is shown below.
I have tried using RAND() and RANDBETWEEN() formulas. I do get random records. But the problem is that I cannot ensure that I get 5 records per combination and sometimes duplicate records are returned as well. I am open to any method (VBA/Formulas) to achieve this.
CodePudding user response:
So basically you have 700 possibilities, and you want to get 5 random values out of them, while you are sure that you don't have duplicates?
There are, basically, two ways to do this:
You make a resulting collection of random values, you use the random generator to generate numbers from 1 to 700, but before adding them to your collection, you verify if they are already present in your collection. Something like (pseudo-code):
Dim col_Result as Collection Dim finished as Boolean = False; Dim r as integer; while (not finished){ r = ConvertToInt(Random(700)) 1; if not(col_Result.Contains(r)) then col_Result.Add(r); finished = (col_Result.Count == 5); }
You make a collection of all numbers from 1 to 700, and 5 times you retrieve a random value out of it, while subtracting that value from the collection. Something like (pseudo-code again):
Dim col_Values as Collection = (1, 2, ..., 700); Dim col_Result as Collection; Dim r as integer; for (int i = 0; i < 5; i ){ r = ConvertToInt(Random(700)); col_Result.Add(r); col_Values.Subtract(r); }
When using this last approach, it is vital that subtracting a value from the collection shifts the other values: (1,2,3,4,5).Subtract(2)
yields (1,3,4,5)
.
CodePudding user response:
Please, test the next code. It needs a reference to 'Microsoft Scripting Runtime':
Sub RandomRecPerGroup()
Dim sh As Worksheet, shRet As Worksheet, lastR As Long, dict As New Scripting.Dictionary
Dim arr, arrIt, i As Long, j As Long, f As Long, k As Long, count As Long, arrFin
Set sh = ActiveSheet 'use here the sheet you need
Set shRet = sh.Next 'use here the sheet you need (for testing reason, the next against the active one)
shRet.Range("G1").EntireColumn.NumberFormat = "@" 'format the column to keep 'Reference number' as text
lastR = sh.Range("A" & sh.rows.count).End(xlUp).row 'last row
arr = sh.Range("A2:G" & lastR).Value ' place the range in an array for faster iteration
ReDim arrFin(1 To 5, 1 To 7): k = 1 'reDim the array to keep each group
For i = 1 To UBound(arr) 'iterate between the array elements:
'create a dictionary key if not already existing, with the number of the row as item:
If Not dict.Exists(arr(i, 1) & arr(i, 2) & arr(i, 3) & arr(i, 4) & arr(i, 5) & arr(i, 6)) Then
dict.Add arr(i, 1) & arr(i, 2) & arr(i, 3) & arr(i, 4) & arr(i, 5) & arr(i, 6), i
Else 'adding the number of row, separated by "|"
dict(arr(i, 1) & arr(i, 2) & arr(i, 3) & arr(i, 4) & arr(i, 5) & arr(i, 6)) = _
dict(arr(i, 1) & arr(i, 2) & arr(i, 3) & arr(i, 4) & arr(i, 5) & arr(i, 6)) & "|" & i
End If
Next i
Dim rndNo As Long 'a variable to receive the random number
For i = 0 To dict.count - 1 'iterate between the dictionary elements:
arrIt = Split(dict.items(i), "|"): ' split the item by "|" to obtain the same group existing rows
For k = 1 To 5 'iterate to extract the 5 necessary sample rows of each group
Randomize 'initialize the random numbers generation
If UBound(arrIt) = -1 Then Exit For 'for the case of less than 5 rows per group
rndNo = CLng(UBound(arrIt) * Rnd()) 'give a value to the variable keeping the random numbers
For f = 1 To 7 'iterating to place in the array all 7 columns value
arrFin(k, f) = arr(arrIt(rndNo), f)
Next f
arrIt = Filter(arrIt, arrIt(rndNo), False) 'eliminate the element just placed in an array, to avoid doubling
Next k
lastR = shRet.Range("A" & sh.rows.count).End(xlUp).row 1 'last empty row of the sheet where the result is returned
shRet.Range("A" & lastR).Resize(5, 7).Value = arrFin 'drop the array content
Next i
MsgBox "Ready..."
End Sub
The code may work without the mentioned reference (using labe binding), but I think it should be good to benefit of intellisense suggestions. If it looks complicated to create it, please (firstly) run the next code which will add it automatically:
Sub addScrRunTimeRef()
'Add a reference to 'Microsoft Scripting Runtime':
'In case of error ('Programmatic access to Visual Basic Project not trusted'):
'Options->Trust Center->Trust Center Settings->Macro Settings->Developer Macro Settings->
' check "Trust access to the VBA project object model"
On Error Resume Next
Application.VBE.ActiveVBProject.References.AddFromFile "C:\Windows\SysWOW64\scrrun.dll"
If err.Number = 32813 Then
err.Clear: On Error GoTo 0
MsgBox "The reference already exists...": Exit Sub
Else
On Error GoTo 0
MsgBox """Microsoft Scripting Runtime"" reference added successfully..."
End If
End Sub
Saving the workbook will keep the reference. So, no need to run the code again...