I'm currently trying to create a pattern to capture a name that generally is text [A-ZÅÄÖ]
but there could be one or two numeric characters.
Here is one example: (there are more fields than this, this is just an example)
PT22 111222333 2 BROTHERS RP02 570244222333
PT25 888222333 THIS TIME IT'S ONLY TEXT RP01 888244222333
I'm trying to get 2 BROTHERS
and THIS TIME IT'S ONLY TEXT
in this tab delimited string (rows are for eached).
When I used the pattern "(\b[A-ZÅÄÖ \&\.\´\:-_] \b)"
it captured BROTHERS
, if I add 0-9
in the pattern it takes either PT22
or 111222333
if the first field is empty.
So what I would need is something that captures the field that has at least three characters that is A-ZÅÄÖ and maximum 2 characters 0-9.
Is that even possible or do I need to loop through the fields and count each type of character to determine if I'm in the correct field or not?
The fields are not in the same order every time.
My backup plan is to split on tab, and loop. On each field remove all [0-9] characters and count.
That way I will get how many [0-9] and how many [A-ZÅÄÖ] ( other) there is, and that could be used I think to find the correct field.
CodePudding user response:
I am not an regex expert, maybe it is possible, but I usually find it too hard to figure this out - and ending with an regex where you're not 100% sure it fits, plus it is hard to figure out what it does once you come back to it.
I would use a small function that simply count the number of chars and digits of a string. Easy to write, easy to read, easy to adapt. Could look like
Function checkWord(s As String) As Boolean
Const minChars = 3
Const maxDigits = 2
Dim i As Long, digits As Long, chars As Long
For i = 1 To Len(s)
Select Case AscW(Mid(s, i, 1))
Case AscW("A") To AscW("Z"), AscW("Ä"), AscW("Ö"), AscW("Å"):
chars = chars 1
Case Asc("0") To Asc("9"):
digits = digits 1
Case Else
' Think about what happens with all other characters...
End Select
Next
checkWord = (chars >= minChars And digits <= maxDigits)
End Function
CodePudding user response:
Without RegEx:
Can't you turn the logic around and remove parts you don't want:
- Loop over each value;
- Your values are tab-delimited so split on
VbTab
; - Loop over each sub-element;
- Test each sub-element with the
Like()
operator and check that it's not numeric.
To test this I came up with the following dummy-code:
Sub Test()
Dim r As Variant: r = Array("PT22" & vbTab & "111222333" & vbTab & "2 BROTHERS" & vbTab & "RP02" & vbTab & "570244222333", "PT25" & vbTab & "888222333" & vbTab & "THIS TIME IT'S ONLY TEXT" & vbTab & "RP01" & vbTab & "888244222333")
For Each el In r
For Each SubEl In Split(el, Chr(9))
If Not SubEl Like "??[0-9][0-9]" And Not IsNumeric(SubEl) Then
Debug.Print SubEl
Exit For
End If
Next
Next
End Sub
With RegEx:
If regex is a must, try to capture what you do want:
Sub Test()
Dim r As Variant: r = Array("PT22" & vbTab & "111222333" & vbTab & "2 BROTHERS" & vbTab & "RP02" & vbTab & "570244222333", "PT25" & vbTab & "888222333" & vbTab & "THIS TIME IT'S ONLY TEXT" & vbTab & "RP01" & vbTab & "888244222333")
With CreateObject("vbscript.regexp")
.Global = True
.Pattern = "(?:^|\s)((?:[A-ZÅÄÖ] (?:[ &.'_-][A-ZÅÄÖ] )*\s?)?(?:\b\d\d?\b)?(?:\s?[A-ZÅÄÖ] (?:[ &.'_-][A-ZÅÄÖ] )*)?)(?:\s|$)|."
For Each el In r
Debug.Print Trim(.Replace(el, "$1"))
Next
End With
End Sub
Or, again remove only those parts you don't want:
Sub Test()
Dim r As Variant: r = Array("PT22" & vbTab & "111222333" & vbTab & "2 BROTHERS" & vbTab & "RP02" & vbTab & "570244222333", "PT25" & vbTab & "888222333" & vbTab & "THIS TIME IT'S ONLY TEXT" & vbTab & "RP01" & vbTab & "888244222333")
With CreateObject("vbscript.regexp")
.Global = True
.Pattern = "\s*\b(?:[A-Z]{2}\d\d|\d{3,})\b\s*"
For Each el In r
Debug.Print .Replace(el, "")
Next
End With
End Sub
Results:
All above options would print:
2 BROTHERS
THIS TIME IT'S ONLY TEXT