Home > Enterprise >  How to get matches from multiple lines using regex in c#
How to get matches from multiple lines using regex in c#

Time:02-15

I ACE | AA33cc55BB44                      |         |            |                I
I     | AAAAAA-BB2CC-4424-1-22            |  11.113 | 10.09.2022 | bCa0111.XAC    I
I     |                                   |         |            |                I
I BAC | Aa315c5cab44                      |         |            |                I
I     | 5564aa-BB2CC-44gd-1-22            |  21.334 | 10.09.2022 | Aba0221.XAC    I
I     |                                   |         |            |                I
I CAC | aacccc54BB44                      |         |            |                I
I     | AAAAAA-BB2CC-aaaa-1-22            |  61.222 | 10.09.2022 | bCa0232.XAC    I
I     |                                   |         |            |                I
I DAC | ii2ii2ii2664                      |         |            |                I
I     | BBBBBB-BB2CC-4424-1-22            |  81.888 | 10.09.2022 | Aba0243.XAC    I


I have used this pattern: \| (.*) \| \d{2}\.\d{3} \| \d{1,2}\.\d{1,2}\.\d{4} \| (.*) \I

Attributes that I want to grab:

Group I:

AA33cc55BB44                      
AAAAAA-BB2CC-4424-1-22
bCa0111.XAC 

Group II:

Aa315c5cab44
5564aa-BB2CC-44gd-1-22 
Aba0221.XAC

Group III:

aacccc54BB44
AAAAAA-BB2CC-aaaa-1-22
bCa0232.XAC

Group IV:

ii2ii2ii2664
BBBBBB-BB2CC-4424-1-22
Aba0243.XAC

Can anyone help me how I can get only these attributes from this text?

CodePudding user response:

You can use

(?m)^[^|\n]*\|[ \t]*([^\s|] ).*\n[^|\n]*\|[ \t]*(\S )\s*(?:\|[^|\n]*){2}\|[ \t]*(\S )

See the regex demo. Details:

  • (?m) - RegexOptions.Multiline option on
  • ^ - start of a line
  • [^|\n]* - zero or more chars other than a newline and |
  • \| - a | char
  • [ \t]* - zero or more spaces or TABs (you may use [\p{Zs}\t]* here to match any Unicode horizontal whitespaces)
  • ([^\s|] ) - Group 1: one or more chars other than whitespace and |
  • .* - the rest of the line
  • \n - a newline char
  • [^|\n]*\|[ \t]* - zero or more chars other than a newline and |, then a | char and zero or more spaces or TABs
  • (\S ) - Group 2: one or more non-whitespace chars
  • \s* - zero or more whitespaces
  • (?:\|[^|\n]*){2} - two sequences of | and then zero or more chars other than | and whitespace
  • \| - a | char
  • [ \t]* - zero or more spaces or TABs
  • (\S ) - Group 3: one or more non-whitespace chars.

In C#:

var pattern = @"^[^|\n]*\|[ \t]*([^\s|] ).*\n[^|\n]*\|[ \t]*(\S )\s*(?:\|[^|\n]*){2}\|[ \t]*(\S )";
var matches = Regex.Matches(text, pattern, RegexOptions.Multiline);
for (Match m in matches)
{
    Console.WriteLine("--- New match ---");
    Console.WriteLine(m.Groups[1].Value);
    Console.WriteLine(m.Groups[2].Value);
    Console.WriteLine(m.Groups[3].Value);
}
  • Related