Getting 3 digits from a set of numbers using RegEx for VBA-CodePudding

I have a set of numbers (15 digits) and need to get the value of the 10th - 11th digit. I am working on using RegEx but so far have only made it to select the first 9 digits and can’t go further from there.

^[1-9\][0-9\][0-9\][0-9\][0-9\][0-9\][0-9\][0-9\][0-9\]

Sample digits are:

188202143002267.50 (need to get only 002)
188202143-002267.50 (need to get only 002 —- already resolved this)

Or if there is a better way to do this? I am working on Excel VBA.

CodePudding user response：

Using regexp.replace and Mid():

Sub Tester2()

    Debug.Print Mid(DigitsOnly("188202143002267.50"), 10, 3)
    Debug.Print Mid(DigitsOnly("188202143-002267.50"), 10, 3)
    Debug.Print Mid(DigitsOnly("XXX188/202-143-002267.50"), 10, 3)

End Sub


Function DigitsOnly(v)
    Static re As Object
    If re Is Nothing Then
        Set re = CreateObject("vbscript.RegExp")
        re.Global = True
        re.Pattern = "\D"
    End If
    DigitsOnly = re.replace(v, "")
End Function

CodePudding user response：

Use Replace and Mid:

Value = "188202143002267.50"
' or:
' Value = "188202143-002267.50"

Digit3 = Mid(Replace(Value, "-", ""), 10, 3)

' Digit3 -> "002"

CodePudding user response：

Alternative via FilterXML()

My intention is not to show a better or more elegant way by this alternative to RegEx, but to point out how one can use FilterXML (available after all since 2013 ) in VBA to return filtered results.

Since the handling of XML is still seen as a bit cryptic, I would like to contribute with some hints to a better understanding of how the FilterXML function works.

What does FilterXML:

The function returns specific data
from a (wellformed) XML content string (roughly comparable to a html-tag structure with individual, but case sensitive node names)
by using a specified XPath expression string (c.f. @JvdV 's excellent examples at Extract substrings ..)

Syntax - see section 2

    FILTERXML(xml, xpath)

Argument 1: XML content - see section 1

The XML standard allows to design use a customized markup language by your own. If you regard each character of an assumed input "A123" as node element, you could

name the starting DocumentElement e.g. r (symbolizing root) and
the individual nodes containing the input characters e.g. i
(and take care of correctly set closing tags

This results in a simple well formed hierarchy like

    <r>
        <i>A</i>
        <i>1</i>
        <i>2</i>
        <i>3</i>
    </r>

Note that the conversion to a valid xml content in this example is executed by a tricky splitting into single characters plus eventual joining together to insert the wanted node names (~ "tags") into the final xml content string.

Argument 2: XPath expression - see section 2

The XML hierarchy allows XPath expressions to search within any indicated node and sub-node references. Thus such an XPath expression string to isolate and return digits only like in the posted question could be

    "//i[.*0=0]"

where //i indicates to search occurrencies of i at any hierarchy level, the brackets [] enclose the filter condition to return only digits, expressed by the multiplication of the referenced node content . with zero if resulting in zero .*0=0, thus excluding other elements than digits.

In contrast to XMLDoM (Document Object Model) coding, filtering via Application.FilterXML 's XPath expressions does not return node objects (or defined node types), but nothing other than the "assigned" node content values (textual child elements in XMLDoM), which Excel for this special question would interpret automatically as numbers (here even: Double!)

*) XMLDoM is a cross-platform and language-independent interface treating the document as a tree structure and allowing programmatic access to the tree

Returned results - see section 3

Results are returned as follows:

a) in the case of several results as a vertical 2-dim generally variant array (in tabular Excel 2019 as dynamic spill range),
b) if there is only one result: as a single value,
c) if there is no result (or an incorrect node query): as an error value (the worksheet function would return #VALUE!, the late-bound Application.FilterXML() function Error 2015)

Before transforming these different filtering results to any wanted output (e.g. by transposing or type transformations) you would check them via VarType() - see section 2) in the following code example:

Function GetDigitsOnly()

Example call due to OP: Mid$(DigitsOnly(s), 10, 2) where s represents a string input

Function DigitsOnly(ByVal s As String) As String
    If Not Len(s) Then Exit Function        ' escape if empty string
'1) Get XML content (as FilterXML's first argument)
    'a) make string splitable via vbNullChar delimiter
    s = StrConv(s, vbUnicode)
    'b) atomize via Split()
    Dim tmp
    tmp = Split(Left(s, Len(s) - 1), vbNullChar, Len(s) \ 2   1)
    'c) get wellformed (html-like) xml content
    tmp = "<r><i>" & Join(tmp, "</i><i>") & "</i></r>"
'2) apply FilterXML({content},{XPath}) to get digits only
    tmp = Application.FilterXML(tmp, "//i[.*0=0]")
'3) return string result
    Select Case VarType(tmp)
        Case vbError                                 ' not any digit found
            Exit Function
        Case vbArray   vbVariant                      ' vertical digits array
            DigitsOnly = Join(Application.Transpose(tmp), vbNullString)
        Case Else                                    ' single (here e.g.: vbDouble) result
            DigitsOnly = tmp
    End Select
End Function