Home > Mobile >  VBA regex problem, trying to extract just the numeric value from the match
VBA regex problem, trying to extract just the numeric value from the match

Time:01-31

I am struggling with a RegEx problem. I have just started to explore RegEx so I am not very familiar yet with all the options. Here is my problem:

  1. I am reading a text file (invoice converted to text) content into a variable
  2. I am trying to find the total price in the invoice
  3. The word 'total' in the file can be written in many diffent ways, like so: 'tot' 'total:' 'total €:' 'total amount: 'total value' etc.

This is what my sample text file looks like:

<etc>
BTW-nr: 0071.21.210 B 08
KvK 27068921
TOTAL € 22.13
Maestro € 22.13
<etc>.

This is my Function to parse the file:

Public Function parse(inpPat As String)

'get the text file content
strFilename = "D:\temp\invoice.txt"
Open strFilename For Input As #iFile
strFileContent = Input(LOF(iFile), iFile)
Close #iFile

Set regexOne = New RegExp
regexOne.Global = False 
regexOne.IgnoreCase = True

regexOne.Pattern = inpPat    
   
Set theMatches = regexOne.Execute(strFileContent)
    
'show the matching strings
For Each Match In theMatches
    res = Match.Value
    Debug.Print res
Next

End Function

To find the total, I can use this:

parse "tot[\w] . (\d (\.|,)\d{2})"

That gives me this result:

TOTAL € 22.13

The Euro sign is distorted for some reason, but I do not care about that. (There are also invoices that omit the Euro sign). The thing is, I want to extract just the numerical value from the result string (22.13). I know I can create a VBA function to do that, but it is much cleaner to do it directly in the regex pattern. I tried this:

parse "(?:(?!tot[\w] . ))(\d (\.|,)\d{2})"

But that gives me this result:

0071.21

It ignores the 'Total' phrase and returns the first match of the second part of my pattern. How can I extract just the number 22.13?

CodePudding user response:

You can use

Dim strFileContent As String
strFileContent = "<etc>" & vbLf & "BTW-nr: 0071.21.210 B 08" & vbLf & "KvK 27068921" & vbLf & "TOTAL € 22.13" & vbLf & "Maestro € 22.13" & vbLf & "<etc>."

Set regexOne = New regExp
regexOne.Global = False
regexOne.IgnoreCase = True

regexOne.pattern = "tot\w(?:.*\D)?(\d [.,]\d{2})"
   
Set theMatches = regexOne.Execute(strFileContent)
    
'show the matching strings
For Each Match In theMatches
    res = Match.SubMatches(0)
    Debug.Print res
Next

to get 22.13 as output.

The tot\w(?:.*\D)?(\d [.,]\d{2}) regex matches

  • tot - a tot string
  • \w - any word char (I kept it since you had it in your original pattern, but you might as well remove it)
  • (?:.*\D)? - an optional pattern matching any zero or more chars other than line break chars as many as possible and then a non-digit char (it is used to get to the "last number")
  • (\d [.,]\d{2}) - Group 1 (we'll access the value via match.SubMatches(0)): one or more digits, then a . or , and then two digits.
  • Related