Home > front end >  How to search a text for a pattern and extract the text-part which follows the pattern?
How to search a text for a pattern and extract the text-part which follows the pattern?

Time:08-05

following problem: I try to read out user names out of a textblock. I normally use .Net for this and tried it with LINQ and Regex but I cant get a solution.

The pattern for the username is 'jane.doe' (without the quotations). Right now I have the following code sequence:

Imports System.Text.RegularExpressions

Public Class Form1
    Dim arrStrSplittet As String()
    Dim strRegEx As String = "[a-z] [.]{1}[a-z] "
    Dim regExKriterium As Regex = New Regex(strRegEx)

    Private Sub Button1_Click(sender As Object, e As EventArgs) Handles Button1.Click
        arrStrSplittet = stringSplitten(TextBox1.Text)
        TextBox2.Text = testFiltern(arrStrSplittet)
    End Sub

    Function testFiltern(text As String()) As String
        Dim query = From x In text Where (regExKriterium.IsMatch(x)) Select x 
        Dim strBuild As New System.Text.StringBuilder()
        For Each y As String In query
            strBuild.AppendLine(y)
        Next
        MsgBox(strBuild.ToString())
        Return strBuild.ToString()
    End Function

    Public Function stringSplitten(text As String) As String()
        Dim arrX = Split(text, vbNewLine)
        Return arrX
    End Function

End Class

I try the following input:

Type Status Name
Benutzer Mustermann, Max (Server-name\max.mustermann)
Benutzer Normalverbraucher, Otto (Server-name\otto.normalverbraucher)
Benutzer Doe, Jane (Server-name\jane.doe)
Benutzer Svensson, Kalle (Server-name\kalle.svensson)
Benutzer Borg, Joe (Server-name\joe.borg)

And with the Code above I get the following output:

Benutzer Mustermann, Max (Server-name\max.mustermann)
Benutzer Normalverbraucher, Otto (Server-name\otto.normalverbraucher)
Benutzer Doe, Jane (Server-name\jane.doe)
Benutzer Svensson, Kalle (Server-name\kalle.svensson)
Benutzer Borg, Joe (Server-name\joe.borg)

The output should be:

max.mustermann
otto.normalverbraucher
jane.doe
kalle.svensson
joe.borg

Is it even possible to change the Object x in the LINQ? Does someone has another idea how to solve this? Currently I have a working (but pretty ugly) solution via InStr.

I hope someone can help me. Thanks in advance! Misao

CodePudding user response:

This is the closest I can get using .Net and Linq and Regex with the format given with readability

var textBook = @"Type Status Name
                 Benutzer Mustermann, Max (Server-name\max.mustermann)
                 Benutzer Normalverbraucher, Otto (Server-name\otto.normalverbraucher)
                 Benutzer Doe, Jane (Server-name\jane.doe)
                 Benutzer Svensson, Kalle (Server-name\kalle.svensson)
                 Benutzer Borg, Joe (Server-name\joe.borg)";

//split by new lines
string[] lines = textBook.Split(
new string[] { Environment.NewLine },
StringSplitOptions.None
);

//removed line without regex match of username
var removedWordWithoutServerName = lines.Where(x => Regex.IsMatch(x, "[a-z] [.]{1}[a-z] ")).ToList();

var userNames = new List<string>();

//split by Server-name\
foreach (var serverNameAndUserName in removedWordWithoutServerName.Select(serverName => serverName.Split(@"Server-name\")))
{
    //Add only matching Regex and replace ")" with ""
    userNames.AddRange(from s in serverNameAndUserName
       where Regex.IsMatch(s, "[a-z] [.]{1}[a-z] ")
       select s.Replace(")", ""));
}

var returnUsernames = userNames;

or a straightforward Linq but less readability

var textBook = @"Type Status Name
                 Benutzer Mustermann, Max (Server-name\max.mustermann)
                 Benutzer Normalverbraucher, Otto (Server-name\otto.normalverbraucher)
                 Benutzer Doe, Jane (Server-name\jane.doe)
                 Benutzer Svensson, Kalle (Server-name\kalle.svensson)
                 Benutzer Borg, Joe (Server-name\joe.borg)";
            
var userNames = new List<string>();

textBook.Split(new string[] { Environment.NewLine },StringSplitOptions.None)
         .Where(x => Regex.IsMatch(x, "[a-z] [.]{1}[a-z] "))
         .Select(x => x.Split(@"Server-name\")).ToList()
         .ForEach(a =>
            {
                userNames.AddRange(from s in a
                    where Regex.IsMatch(s, "[a-z] [.]{1}[a-z] ")
                    select s.Replace(")", ""));
            });

var returnNames = userNames;
 

CodePudding user response:

The solutions you have so far are complete overkill. Regex.Match will return captures, you specify them with ()

Dim users = lines
    .Select(Function(l) Regex.Match(l, "\(. ?\\(. ?)\)"))
    .Where(Function(m) m.Success)
    .Select(Function (m) m.Groups(1).Captures(0).Value)
    .ToList()

The Regex goes as follows

  • \( an escaped open parenthesis
  • . ? any characters, lazy match (the minimum possible)
  • \\ an escaped backslash
  • ( begins a capture
  • . ? any characters
  • ) ends the capture
  • \) an escaped closing parenthesis

VB dotnetfiddle

C# dotnetfiddle

  • Related