Home > Blockchain >  How to extract name from a file name in the form "<name>_<fileNum>of<fileNumTota
How to extract name from a file name in the form "<name>_<fileNum>of<fileNumTota

Time:12-09

a user specifies a file name that can be either in the form "<name>_<fileNum>of<fileNumTotal>" or simply "<name>". I need to somehow extract the "<name>" part from the full file name.

Basically, I am looking for a solution to the method "ExtractName()" in the following example:

string fileName = "example_File";  \\ This var is specified by user
string extractedName = ExtractName(fileName);  // Must return "example_File"
fileName = "example_File2_1of5";
extractedName = ExtractName(fileName);  // Must return "example_File2"
fileName = "examp_File_3of15";
extractedName = ExtractName(fileName);  // Must return "examp_File"
fileName = "example_12of15";
extractedName = ExtractName(fileName);  // Must return "example"

Edit: Here's what I've tried so far:

ExtractName(string fullName)
{
    return fullName.SubString(0, fullName.LastIndexOf('_'));
}

But this clearly does not work for the case where the full name is just "<name>".

Thanks

CodePudding user response:

This would be easier to parse using Regex, because you don't know how many digits either number will have.

var inputs = new[]
{
    "example_File",
    "example_File2_1of5",
    "examp_File_3of15",
    "example_12of15"
};

var pattern = new Regex(@"^(. )(_\d of\d )$");
foreach (var input in inputs)
{
    var match = pattern.Match(input);
    if (!match.Success)
    {
        // file doesn't end with "#of#", so use the whole input
        Console.WriteLine(input);
    }
    else
    {
        // it does end with "#of#", so use the first capture group
        Console.WriteLine(match.Groups[1].Value);
    }
}

This code returns:

example_File
example_File2
examp_File
example

The Regex pattern has three parts:

  1. ^ and $ are anchors to ensure you capture the entire string, not just a subset of characters.
  2. (. ) - match everything, be as greedy as possible.
  3. (_\d of\d ) - match "_#of#", where "#" can be any number of consecutive digits.
  • Related