Home > OS >  c# Regex get part of path using regex
c# Regex get part of path using regex

Time:06-13

I am using regex to get specyfic information from string. Value of string would look like:

\subpath1\subpath2\subpathn\4xxxx_2xxxx\filename.extension
//there can be many subpath and x is allways number, last part of path is allways number_number            
//and it starts with 4 and last part is allways files with extension
//so I want to exclude path for example 4xxxx_xxxx/path/file.extension

So far using regex I came up wityh this construction (?<=\)(4[0-9])_([0-9]).?." but:

  • Last part takes string as it is no matter if it is "sasas" or "sasas.sas"
  • I do not know if it fills all my requirements

Any suggestions on this one?

CodePudding user response:

You can use

(?<=\\)(4[0-9]*)_([0-9]*)\\[^\\] \.\w 

See the regex demo.

Details:

  • (?<=\\) - a positive lookbehind that requires a \ char to appear immediately to the left of the current location
  • (4[0-9]*) - Group 1: 4 and then zero or more ASCII digits
  • _ - an underscore
  • ([0-9]*) - Group 2: any zero or more ASCII digits
  • \\ - a \ char
  • [^\\] - one or more chars other than \
  • \. - a dot
  • \w - one or more word chars.

CodePudding user response:

Here is an alternative approach:

string path = "subpath1/subpath2/subpathn/41234_23456/excludePath/filename.extension";
string importantDirectory = path.Split('/').First(x => Regex.IsMatch(x, @"4\d _\d "));
string fileName = Path.GetFileName(path);
string result = Path.Combine(importantDirectory, fileName);
Console.WriteLine(result);

41234_23456\filename.extension

CodePudding user response:

A. 4 Numbers = [0-9]{4} OR \d{4} OR \d\d\d\d If the number can be short or long, use for "one or more": \d _\d

B. The path delimiter in the example is a backslash, and in the comment example a slash. both of them need escap with a backslash before, use [\/\\] for all format.

C. if the file name must have an extension, the expression need one or more valid file character, dot, and again one or more valid file character. such as \w \.\w use \b to ensure the end of a string/path.

Note that a valid file name varies from system to system (Mac or Windows for example), And is in any case wider than \w which includes only a-zA-Z0-9_.

My suggestin:

\d _\d [\/\\]\w \.\w \b

https://regex101.com/r/Ed2H0u/1

C# code:

    var textInput = @"
\subpath1\subpath2\subpathn\4123_21253\filename.extension
\subpath2\subpathn\4123_21253\subpathn\filename.extension
";

    var matches = Regex.Matches(textInput, @"\b[\w\/\\] [\/\\](\d _\d )[\/\\](\w \.\w )\b");
    foreach (Match element in matches)
    {
        Console.WriteLine("Path: "   element.Value);
        Console.WriteLine("Number: "   element.Groups[1].Value);
        Console.WriteLine("FileName: "   element.Groups[2].Value);
    }

https://dotnetfiddle.net/V87CKc

  • Related