Home > Blockchain >  Recognize block of data as one block using regex vba
Recognize block of data as one block using regex vba

Time:11-09

I am trying to create a pattern for the following text

not included

 4680145876 
some text some text ffgg   
 30905102511638 
 1
other text other text

no included

Here's my try

^\s*\d{6,10}(?:\n(?!\s*\d{1,}\n).*){5}

I will be using such a pattern in VBA The expected output to be highlighted (in five lines)

468049876 
some text some text ffgg   
 30905102639638 
 1
other text other text

** I have updated the question as I face a problem Suppose the text like that

not included

 468041476 
some text some text ffgg   
 31605102764638 
 1
other text other text
extra line
 416524332 
some text some text ffgg   
 30905103594638 
 1
other text other text
extra line
6354422
no included

Here I need the block to follow the sequence: 1- Numbers from 6 to 12 digits 2- Then some text in one line 3- Numbers equals to 14 digits 4- Numbers from 1 to 3 digits 5- Text (this is the problem as this text may be in two lines not one line) and I need to include that extra line as one line so the output of the text example

 468049876 
some text some text ffgg   
 30905103685638 
 1
other text other text extra line

and

 416524332 
some text some text ffgg   
 30905101497638 
 1
other text other text extra line

I mean that text would include two blocks only (each of five lines)

  • I am using such a code:

      With CreateObject("VBScript.RegExp")
      .Global = True: .MultiLine = True: .IgnoreCase = True
      .Pattern = sPattern
      If .Test(sInput) Then
          Set col = .Execute(sInput)
          For i = 0 To col.Count - 1
              x = Split(col.Item(i), vbLf)
              cnt = cnt   1
              For j = LBound(x) To UBound(x)
                  a(i   1, j   1) = Application.WorksheetFunction.Clean(Trim(x(j)))
              Next j
          Next i
      End If
    

    End With

Now when looping through the matches I got the variable x of more than five items. I expected to get only five items. How can I pick up the second group of each match separately?

CodePudding user response:

It seems to me you should check for 6-10 digit number in the negative condition, and to match whitespace byt line breaks you can use [^\S\r\n]:

^( *\d{6,12} *\n.*\n *\d{14} *\n *\d{1,3} *)((?:\n(?! *\d{6,10} *$). )*)

See the regex demo. Details:

  • ^ - start of a line (remember to use )
  • ( - Group 1 start:
    • * - zero or more spaces
    • \d{6,10} - six to ten digits
    • * - zero or more spaces
    • \n.* - a line
    • \n *\d{14} * - a line with 14 digits enclosed with zero or more spaces
    • \n *\d{1,3} * - a line with one to three digits enclosed with zero or more spaces
  • ) - end of Group 1
  • ((?:\n(?! *\d{6,10} *$). )*) - Group 2:
    • (?: - start of a non-capturing group:
      • \n - an LF line ending
      • (?! *\d{6,12} *$) - not immediately followed with zero or more spaces, six to twelve digits, zero or more spaces and end of a line
      • . - a non-empty line (one or more chars other than line break chars as many as possible
    • )* - end of the grouping, zero or more occurrences.
  • ) - end of Group 2.

After getting matches, Group 2 contains the last block of lines, so you can manipulate that text as much as you want, and then concatenate with Group 1 value.

  • Related