Home > OS >  Extract multiple strings via Regex in c#
Extract multiple strings via Regex in c#

Time:07-07

Based on this question, I want to extract multiple data points from a text file. The text file is basically a C# string with a Key: Value scheme. Each key-value pair is on one line. I created the following code:

var matches = Regex.Matches(
    pageText,
    "^Ort Lieferadresse: (?<deliveryAdress>.*)$|^Referenz: (?<reference>.*)$|^Lademittel: (?<loading>.*)$|^Plombennummer: (?<plombe>.*)$|^Bemerkung: (?<remarks>.*)$",
    RegexOptions.Multiline);

This works, however I have my problems to extract the actual captures, because it returns five matches. I also tried to use the Match method, but all but the very first group are getting found.

Is there any way to return all captures in one go?

Here's some sample data:

Verladeplan
Erzeugt von: moehlerj
Erzeugt am: 01.03.2022
Ausliefertermin: 03.03.2022-01:00:00
Darstellung Verladeplan
Ladeinformationen
Ausliefertermin: 03.03.2022-01:00:00
Ort Lieferadresse: Foo
Referenz: Bar
Lademittel: 40' Container
Lademeter: 1176.000
Gesamtanzahl Paletten: 24
Gesamtbruttogewicht: 6669.0
Plombennummer: keine, da Abholung im LKW
Bemerkung: Kennzeichen: AB12345 / CD67890
Containernummer:
TARA:
Seite 1 von 2

CodePudding user response:

You can use

var text = "Ort Lieferadresse: deliveryAdress\r\nReferenz: reference\r\nLademittel: loading\r\nPlombennummer: plombe\r\nBemerkung: remarks";
var pattern = @"^Ort Lieferadresse: (?<deliveryAdress>[^\r\n]*)\r?$|^Referenz: (?<reference>[^\r\n]*)\r?$|^Lademittel: (?<loading>[^\r\n]*)\r?$|^Plombennummer: (?<plombe>[^\r\n]*)\r?$|^Bemerkung: (?<remarks>[^\r\n]*)\r?$";
var results = Regex.Matches(text, pattern, RegexOptions.Multiline)
        .Cast<Match>()
        .SelectMany(m => m.Groups.Skip(1))
        .Where(n => n.Success);
foreach (Group grp in results)
    Console.WriteLine("{0}: {1}", grp.Name, grp.Value);

See the C# demo yielding

deliveryAdress: deliveryAdress
reference: reference
loading: loading
plombe: plombe
remarks: remarks

First of all, to support the CRLF line endings and bearing in mind the . meaning in a .NET regex, I suggest replacing .* with [^\r\n]* and adding an optional CR pattern (\r?) before the $ end of line anchor.

Then, .Cast<Match>() gets a list of all match objects returned by the Regex.Matches(text, pattern, RegexOptions.Multiline), the .SelectMany(m => m.Groups.Skip(1)) gets the Groups property of each match object without the zeroth item (it is the whole match that we do not need), and .Where(n => n.Success) will only keep the groups that participated in the match.

CodePudding user response:

If a specific order is defined for these lines, then replace the regex OR | by regex new-line \n. You can also drop the beginning and end-of lines ^ and $ around the \n:

var match = Regex.Match(
    pageText,
    @"^Ort Lieferadresse: (?<deliveryAdress>.*)\r\nReferenz: (?<reference>.*)\r\nLademittel: (?<loading>.*)\r(.|\r|\n)*\nPlombennummer: (?<plombe>.*)\r\nBemerkung: (?<remarks>.*)$",
    RegexOptions.Multiline);

// Test
if (match.Success) {
    Console.WriteLine(match.Groups["deliveryAdress"].Value);
    Console.WriteLine(match.Groups["reference"].Value);
    Console.WriteLine(match.Groups["loading"].Value);
    Console.WriteLine(match.Groups["plombe"].Value);
    Console.WriteLine(match.Groups["remarks"].Value);
} else {
    Console.WriteLine("no match");
}

This makes it find one single match.


If the information can appear in any order, I suggest not using regex at all and to load the file using:

IEnumerable<string> lines = File.ReadLines(path);

Then, insert the information into a dictionary. This allows you to access the desired data easily. Also, the dictionary contains automatically all available tags.

var dict = lines
    .Select(l => (text: l, index: l.IndexOf(": ")))
    .Where(t => t.index > 0)
    .Select(t => (key: t.text[0..t.index], value: t.text[(t.index   2)..]))
    .DistinctBy(kv => kv.key) // Because Ausliefertermin occurrs twice
    .ToDictionary(kv => kv.key, kv => kv.value);

This test

Console.WriteLine($"Ort Lieferadresse = {dict["Ort Lieferadresse"]}");
Console.WriteLine($"Referenz = {dict["Referenz"]}");
Console.WriteLine($"Lademittel = {dict["Lademittel"]}");
Console.WriteLine($"Plombennummer = {dict["Plombennummer"]}");
Console.WriteLine($"Bemerkung = {dict["Bemerkung"]}");

yields

Ort Lieferadresse = Foo
Referenz = Bar
Lademittel = 40' Container
Plombennummer = keine, da Abholung im LKW
Bemerkung = Kennzeichen: AB12345 / CD67890
  • Related