Home > Back-end >  How to get whats between two numbers in a string?
How to get whats between two numbers in a string?

Time:12-04

I have a lot of movie files and I want to get their production year from their file names. as below:

Input: Kingdom.of.Heaven.2005.720p.Dubbed.Film2media

Output: 2005

This code just splits all the numbers:

string[] result = Regex.Split(str, @"(\d :)");

CodePudding user response:

You must be more specific about which numbers you want. E.g.

Regex to find the year (not for splitting):

\b(19\d\d)|(20\d\d)\b
  • 19\d\d selects numbers like 1948, 1989.
  • 20\d\d selects numbers like 2001, 2022.
  • \b specifies the word limits. It excludes numbers or words with 5 or more digits.
  • | means or

But it is difficult to make a fool proof algorithm without knowing how exactly the filename is constructed. E.g. the movie "2001: A Space Odyssey" was released in 1968. So, 2001 is not a correct result here.

To omit the movie name, you could search backwards like this:

string productionYear =
    Regex.Match(str, @"\b(19\d\d)|(20\d\d)\b", RegexOptions.RightToLeft);

If instead of 720p we had a resolution of 2048p for instance, this would not be a problem, because the 2nd \b requires the number to be at the word end.


If the production year was always the 4th item from the right, then a better way to get this year would be:

string[] parts = str.Split('.');
string productionYear = parts[^4]; // C# 8.0 , .NET Core
// or
string productionYear = parts[parts.Length - 4]; // C# < 8 or .NET Framework

Note that the regex expression you specify in Regex.Split designates the separators, not the returned values.

CodePudding user response:

I would not try to split the string, more like match a field. Also, consider matching \d{4} and not \d if you want to be sure to get years and not other fields like resolution in your example

CodePudding user response:

You can try this:

var regex = new Regex(@"\b\d{4}\b");
var myInput = "Kingdom.of.Heaven.2005.720p.Dubbed.Film2media";
var productionYear = regex.Matches(myInput).Single().Value;

Console.WriteLine($"Production year: {productionYear}");

Demo: https://dotnetfiddle.net/KM2PNk

Output:

Production year: 2005
  •  Tags:  
  • c#
  • Related