I am struggling with a RegEx problem. I have just started to explore RegEx so I am not very familiar yet with all the options. Here is my problem:
- I am reading a text file (invoice converted to text) content into a variable
- I am trying to find the total price in the invoice
- The word 'total' in the file can be written in many diffent ways, like so: 'tot' 'total:' 'total €:' 'total amount: 'total value' etc.
This is what my sample text file looks like:
<etc>
BTW-nr: 0071.21.210 B 08
KvK 27068921
TOTAL € 22.13
Maestro € 22.13
<etc>.
This is my Function to parse the file:
Public Function parse(inpPat As String)
'get the text file content
strFilename = "D:\temp\invoice.txt"
Open strFilename For Input As #iFile
strFileContent = Input(LOF(iFile), iFile)
Close #iFile
Set regexOne = New RegExp
regexOne.Global = False
regexOne.IgnoreCase = True
regexOne.Pattern = inpPat
Set theMatches = regexOne.Execute(strFileContent)
'show the matching strings
For Each Match In theMatches
res = Match.Value
Debug.Print res
Next
End Function
To find the total, I can use this:
parse "tot[\w] . (\d (\.|,)\d{2})"
That gives me this result:
TOTAL € 22.13
The Euro sign is distorted for some reason, but I do not care about that. (There are also invoices that omit the Euro sign). The thing is, I want to extract just the numerical value from the result string (22.13). I know I can create a VBA function to do that, but it is much cleaner to do it directly in the regex pattern. I tried this:
parse "(?:(?!tot[\w] . ))(\d (\.|,)\d{2})"
But that gives me this result:
0071.21
It ignores the 'Total' phrase and returns the first match of the second part of my pattern. How can I extract just the number 22.13?
CodePudding user response:
You can use
Dim strFileContent As String
strFileContent = "<etc>" & vbLf & "BTW-nr: 0071.21.210 B 08" & vbLf & "KvK 27068921" & vbLf & "TOTAL € 22.13" & vbLf & "Maestro € 22.13" & vbLf & "<etc>."
Set regexOne = New regExp
regexOne.Global = False
regexOne.IgnoreCase = True
regexOne.pattern = "tot\w(?:.*\D)?(\d [.,]\d{2})"
Set theMatches = regexOne.Execute(strFileContent)
'show the matching strings
For Each Match In theMatches
res = Match.SubMatches(0)
Debug.Print res
Next
to get 22.13
as output.
The tot\w(?:.*\D)?(\d [.,]\d{2})
regex matches
tot
- atot
string\w
- any word char (I kept it since you had it in your original pattern, but you might as well remove it)(?:.*\D)?
- an optional pattern matching any zero or more chars other than line break chars as many as possible and then a non-digit char (it is used to get to the "last number")(\d [.,]\d{2})
- Group 1 (we'll access the value viamatch.SubMatches(0)
): one or more digits, then a.
or,
and then two digits.