Home > Mobile >  Extract 2 numbers from text (bulk list)
Extract 2 numbers from text (bulk list)

Time:11-29

Background info:

I have a listing of 5000 error messages in this format:

"999999 16 901 F SMITH, Smith FT 1 1.0 Additional Leave hours -4.0000 exceed entitlement plus pro-rata -4.0000"

I have been able to categorise them using a macro, so "Additional Leave hours exceed entitlement plus pro-rata" for example.

From there I'm trying to extract the two numbers.

I can do it manually using these formulas:

=MID(J3,SEARCH("hours ",J3) 5,SEARCH("exceed",J3)-SEARCH("hours ",J3)-6)
   
=TRIM(RIGHT(SUBSTITUTE(J3," ",REPT(" ",LEN(J3))),LEN((J3))))

But this is where i'm stuck, incorporating that logic in the macro and having it loop through the full list.

This was my first attempt:

If InStr(myString, "Additional Leave hours ") > 0 And InStr(myString, "exceed entitlement plus pro-rata") Then

'set category
Cells(x, 6).Value = "Additional Leave hours exceed entitlement plus pro-rata"

'first number
Cells(x, 8).ForumlaR1C1 = "=MID(RC[2],SEARCH(""hours "",RC[2]) 5,SEARCH(""exceed"",RC[2])-SEARCH(""hours "",RC[2])-6"

'second number
Cells(x, 9).FormulaR1C1 = "=TRIM(RIGHT(SUBSTITUTE(RC[2],"" "",REPT("" "",LEN(RC[2]))),LEN((RC[2]))))"

'first minus second
Cells(x, 7).FormulaR1C1 = "=SUM(RC[2]-RC[1]"
    
End If

From there I have been able to use .Select & .Active cell, and it works but not efficiently:

'first number
Cells(x, 8).Select
        
ActiveCell.FormulaR1C1 = "=MID(RC[2],SEARCH(""hours"",RC[2]) 5,SEARCH(""exceed"",RC[2])SEARCH(""hours "",RC[2])-6)"

Any help would be appreciated, thanks in advance.

CodePudding user response:

Idea is to process all the strings in an array (so that it's faster, compared to writing/reading cells 1 by 1), use RegExp to extract the 2 numbers into an array which will be used to paste into the previous 2 columns. Finally insert the SUM formula into the column before:

Sub Test()
    Const inputStartRow As Long = 1
    Const inputCol As String = "J"
    Const regexPattern As String = "Additional Leave hours ([-\d.]{1,}) exceed entitlement plus pro-rata ([-\d.]{1,})"
    
    Dim ws As Worksheet
    Set ws = ThisWorkbook.Worksheets("Sheet1") 'Change name accordingly
    
    Dim regex As Object
    Set regex = CreateObject("VBScript.RegExp")
    With regex
        .Pattern = regexPattern
        .Global = False
    End With
    
    '==== Get last row of the input column and set to range
    Dim inputLastRow As Long
    inputLastRow = ws.Cells(ws.Rows.Count, inputCol).End(xlUp).Row
        
    Dim inputRng As Range
    Set inputRng = ws.Range(ws.Cells(inputStartRow, inputCol), ws.Cells(inputLastRow, inputCol))
    
    '==== Populate the array with the input range's value
    Dim inputArr As Variant
    inputArr = inputRng.Value
        
    Dim outputArr() As String
    ReDim outputArr(1 To UBound(inputArr, 1), 1 To 2) As String
    
    '==== Loop through the array and extract the 2 numbers
    Dim i As Long
    For i = 1 To UBound(inputArr, 1)
        If InStr(inputArr(i, 1), "Additional Leave hours ") > 0 And InStr(inputArr(i, 1), "exceed entitlement plus pro-rata") Then
            If regex.Test(inputArr(i, 1)) Then
                Dim regexMatch As Object
                Set regexMatch = regex.Execute(inputArr(i, 1))(0)
                                
                outputArr(i, 1) = regexMatch.SubMatches(0)
                outputArr(i, 2) = regexMatch.SubMatches(1)
            End If
        End If
    Next i
    
    '==== Insert the extraction @ Input column - 1/ -2
    Dim outputRng As Range
    Set outputRng = inputRng.Offset(, -2).Resize(, 2)
    outputRng.Value = outputArr
    
    Set outputRng = Nothing
    
    '==== Add in SUM formula @ Input Column - 3
    Dim sumRng As Range
    Set sumRng = inputRng.Offset(, -3)
    sumRng.Formula = "=SUM(" & ws.Cells(inputStartRow, sumRng.Column   1).Address(RowAbsolute:=False) & "-" & ws.Cells(inputStartRow, sumRng.Column   2).Address(RowAbsolute:=False) & ")"
    
    Set sumRng = Nothing
End Sub
  • Related