I am trying to create a pattern for the following text
not included
4680145876
some text some text ffgg
30905102511638
1
other text other text
no included
Here's my try
^\s*\d{6,10}(?:\n(?!\s*\d{1,}\n).*){5}
I will be using such a pattern in VBA The expected output to be highlighted (in five lines)
468049876
some text some text ffgg
30905102639638
1
other text other text
** I have updated the question as I face a problem Suppose the text like that
not included
468041476
some text some text ffgg
31605102764638
1
other text other text
extra line
416524332
some text some text ffgg
30905103594638
1
other text other text
extra line
6354422
no included
Here I need the block to follow the sequence: 1- Numbers from 6 to 12 digits 2- Then some text in one line 3- Numbers equals to 14 digits 4- Numbers from 1 to 3 digits 5- Text (this is the problem as this text may be in two lines not one line) and I need to include that extra line as one line so the output of the text example
468049876
some text some text ffgg
30905103685638
1
other text other text extra line
and
416524332
some text some text ffgg
30905101497638
1
other text other text extra line
I mean that text would include two blocks only (each of five lines)
I am using such a code:
With CreateObject("VBScript.RegExp") .Global = True: .MultiLine = True: .IgnoreCase = True .Pattern = sPattern If .Test(sInput) Then Set col = .Execute(sInput) For i = 0 To col.Count - 1 x = Split(col.Item(i), vbLf) cnt = cnt 1 For j = LBound(x) To UBound(x) a(i 1, j 1) = Application.WorksheetFunction.Clean(Trim(x(j))) Next j Next i End If
End With
Now when looping through the matches I got the variable x of more than five items. I expected to get only five items. How can I pick up the second group of each match separately?
CodePudding user response:
It seems to me you should check for 6-10 digit number in the negative condition, and to match whitespace byt line breaks you can use [^\S\r\n]
:
^( *\d{6,12} *\n.*\n *\d{14} *\n *\d{1,3} *)((?:\n(?! *\d{6,10} *$). )*)
See the regex demo. Details:
^
- start of a line (remember to use )(
- Group 1 start:*
- zero or more spaces\d{6,10}
- six to ten digits*
- zero or more spaces\n.*
- a line\n *\d{14} *
- a line with 14 digits enclosed with zero or more spaces\n *\d{1,3} *
- a line with one to three digits enclosed with zero or more spaces
)
- end of Group 1((?:\n(?! *\d{6,10} *$). )*)
- Group 2:(?:
- start of a non-capturing group:\n
- an LF line ending(?! *\d{6,12} *$)
- not immediately followed with zero or more spaces, six to twelve digits, zero or more spaces and end of a line.
- a non-empty line (one or more chars other than line break chars as many as possible
)*
- end of the grouping, zero or more occurrences.
)
- end of Group 2.
After getting matches, Group 2 contains the last block of lines, so you can manipulate that text as much as you want, and then concatenate with Group 1 value.