I have created and tested this Regexpattern <\w\w:Value> SYMBOL: (P.*)=(.*)\/\/(.*)
Regex regexPattern = new Regex(@"<\w\w:Value> SYMBOL: (P.*)=(.*)\/\/(.*)");
var attributeChecker = regexPattern.Match(line);
var attributeLongDescription = attributeChecker.Groups[3].ToString().Trim();
Here is the model:
<AC:Value> SYMBOL: PDWFNA = 0; // Projektierung D-Weg Freimeldung nicht
// auswerten
<AC:Value> SYMBOL: PDWLE = 0; // Länge des Durchrutschweges
The results that I am getting, from group three are:
Projektierung D-Weg Freimeldung nicht
Länge des Durchrutschweges
How can I get these results from Group three:
Projektierung D-Weg Freimeldung nicht auswerten
Länge des Durchrutschweges
CodePudding user response:
You cannot capture disjoint parts of a string into a single capturing group. You need to match all the lines below your pattern match that are continuation of the comment, and then post-process the result.
You can use the following approach (see the C# demo):
var text = @"<AC:Value> SYMBOL: PDWFNA = 0; // Projektierung D-Weg Freimeldung nicht
// auswerten
<AC:Value> SYMBOL: PDWLE = 0; // Länge des Durchrutschweges";
var matches = Regex.Matches(text, @"<\w{2}:Value> SYMBOL: (P.*)=(.*)//(.*(?:\n[\s-[\r\n]]*//.*)*)");
foreach (Match m in matches)
{
Console.WriteLine("--- A new match ---");
Console.WriteLine($"Group 1: {m.Groups[1].Value}");
Console.WriteLine($"Group 2: {m.Groups[2].Value}");
Console.WriteLine("Group 3: {0}",
string.Join(" ",
m.Groups[3].Value.Split(new[] {"//"}, StringSplitOptions.RemoveEmptyEntries)
.Select(x => x.Trim())
)
);
}
Output:
--- A new match ---
Group 1: PDWFNA
Group 2: 0;
Group 3: Projektierung D-Weg Freimeldung nicht auswerten
--- A new match ---
Group 1: PDWLE
Group 2: 0;
Group 3: Länge des Durchrutschweges
See also the regex demo.
The (.*(?:\n[\s-[\r\n]]*//.*)*)
part captures into Group 3 the rest of the current line with .*
, then any zero or more lines that can start with zero or more whitespaces other than CR and LF, then have //
and then anything till the end of the line.
The string.Join(" ", m.Groups[3].Value.Split(new[] {"//"}, StringSplitOptions.RemoveEmptyEntries).Select(x => x.Trim()))
is one way of post-processing Group 3 value. Here, it is split with //
substring and then all the resulting items get stripped from leading/trailing whitespace and then they are joined into a single string with a space.
You may also use Regex.Replace(m.Groups[3].Value, @"\s*//\s*", " ")
instead to make it shorter.
CodePudding user response:
After the matching, you can process the match of group 3, removing the leading newline, the spaces and //
<\w\w:Value> SYMBOL: (P[^=\n]*)=(.*?)//(.*(?:\n[\p{Zs}\t]*//.*)*)
The pattern matches:
<\w\w:Value> SYMBOL:
Match literally(P[^=\n]*)
Capture group 1, matchP
followed by not=
or a newline=
Match literally(.*?)
Capture group 2, match any char except a newline non greedy//
Match literally(
Capture group 3.*
Match the rest of the line(?:
Non capture group\n[\p{Zs}\t]*//.*
Match a newline, optional spaces and // and the rest of the line
)*
Close
)
Close group 3
For example, printing only group 3 after the replacement:
string pattern = @"<\w\w:Value> SYMBOL: (P[^=\n]*)=(.*?)//(.*(?:\n[\p{Zs}\t]*//.*)*)";
string input = @"<AC:Value> SYMBOL: PDWFNA = 0; // Projektierung D-Weg Freimeldung nicht
// auswerten
<AC:Value> SYMBOL: PDWLE = 0; // Länge des Durchrutschweges";
foreach (Match match in Regex.Matches(input, pattern))
{
Console.WriteLine(Regex.Replace(match.Groups[3].Value, @"\r?\n[\p{Zs}\t] //",""));
}
Output
Projektierung D-Weg Freimeldung nicht auswerten
Länge des Durchrutschweges