Regex Pattern that gets string from two line-CodePudding

I have created and tested this Regexpattern <\w\w:Value> SYMBOL: (P.*)=(.*)\/\/(.*)

 Regex regexPattern = new Regex(@"<\w\w:Value> SYMBOL: (P.*)=(.*)\/\/(.*)");
 var attributeChecker = regexPattern.Match(line);
 var attributeLongDescription = attributeChecker.Groups[3].ToString().Trim();

Here is the model:

<AC:Value> SYMBOL: PDWFNA     = 0;        // Projektierung D-Weg Freimeldung nicht
                                          // auswerten
<AC:Value> SYMBOL: PDWLE      = 0;        // Länge des Durchrutschweges

The results that I am getting, from group three are:

Projektierung D-Weg Freimeldung nicht
Länge des Durchrutschweges

How can I get these results from Group three:

Projektierung D-Weg Freimeldung nicht auswerten
Länge des Durchrutschweges

CodePudding user response：

You cannot capture disjoint parts of a string into a single capturing group. You need to match all the lines below your pattern match that are continuation of the comment, and then post-process the result.

You can use the following approach (see the C# demo):

var text = @"<AC:Value> SYMBOL: PDWFNA     = 0;        // Projektierung D-Weg Freimeldung nicht
                                          // auswerten
<AC:Value> SYMBOL: PDWLE      = 0;        // Länge des Durchrutschweges";
var matches = Regex.Matches(text, @"<\w{2}:Value> SYMBOL: (P.*)=(.*)//(.*(?:\n[\s-[\r\n]]*//.*)*)");
foreach (Match m in matches) 
{
    Console.WriteLine("--- A new match ---");
    Console.WriteLine($"Group 1: {m.Groups[1].Value}");
    Console.WriteLine($"Group 2: {m.Groups[2].Value}");
    Console.WriteLine("Group 3: {0}",
        string.Join(" ", 
            m.Groups[3].Value.Split(new[] {"//"}, StringSplitOptions.RemoveEmptyEntries)
                .Select(x => x.Trim())
        )
    );
}

Output:

--- A new match ---
Group 1: PDWFNA     
Group 2:  0;        
Group 3: Projektierung D-Weg Freimeldung nicht auswerten
--- A new match ---
Group 1: PDWLE      
Group 2:  0;        
Group 3: Länge des Durchrutschweges

See also the regex demo.

The (.*(?:\n[\s-[\r\n]]*//.*)*) part captures into Group 3 the rest of the current line with .*, then any zero or more lines that can start with zero or more whitespaces other than CR and LF, then have // and then anything till the end of the line.

The string.Join(" ", m.Groups[3].Value.Split(new[] {"//"}, StringSplitOptions.RemoveEmptyEntries).Select(x => x.Trim())) is one way of post-processing Group 3 value. Here, it is split with // substring and then all the resulting items get stripped from leading/trailing whitespace and then they are joined into a single string with a space.

You may also use Regex.Replace(m.Groups[3].Value, @"\s*//\s*", " ") instead to make it shorter.

CodePudding user response：

After the matching, you can process the match of group 3, removing the leading newline, the spaces and //

<\w\w:Value> SYMBOL: (P[^=\n]*)=(.*?)//(.*(?:\n[\p{Zs}\t]*//.*)*)

The pattern matches:

<\w\w:Value> SYMBOL: Match literally
(P[^=\n]*) Capture group 1, match P followed by not = or a newline
= Match literally
(.*?) Capture group 2, match any char except a newline non greedy
// Match literally
( Capture group 3
- .* Match the rest of the line
- (?: Non capture group
  - \n[\p{Zs}\t]*//.* Match a newline, optional spaces and // and the rest of the line
- )* Close
) Close group 3

.NET regex demo | C# demo

For example, printing only group 3 after the replacement:

string pattern = @"<\w\w:Value> SYMBOL: (P[^=\n]*)=(.*?)//(.*(?:\n[\p{Zs}\t]*//.*)*)";
string input = @"<AC:Value> SYMBOL: PDWFNA     = 0;        // Projektierung D-Weg Freimeldung nicht
                                            // auswerten
    <AC:Value> SYMBOL: PDWLE      = 0;        // Länge des Durchrutschweges"; 
        
            
foreach (Match match in Regex.Matches(input, pattern))
{
    Console.WriteLine(Regex.Replace(match.Groups[3].Value, @"\r?\n[\p{Zs}\t] //",""));              
}

Output

Projektierung D-Weg Freimeldung nicht auswerten
Länge des Durchrutschweges