I have a lot of movie files and I want to get their production year from their file names. as below:
Input: Kingdom.of.Heaven.2005.720p.Dubbed.Film2media
Output: 2005
This code just splits all the numbers:
string[] result = Regex.Split(str, @"(\d :)");
CodePudding user response:
You must be more specific about which numbers you want. E.g.
Regex to find the year (not for splitting):
\b(19\d\d)|(20\d\d)\b
19\d\d
selects numbers like 1948, 1989.20\d\d
selects numbers like 2001, 2022.\b
specifies the word limits. It excludes numbers or words with 5 or more digits.|
means or
But it is difficult to make a fool proof algorithm without knowing how exactly the filename is constructed. E.g. the movie "2001: A Space Odyssey" was released in 1968. So, 2001
is not a correct result here.
To omit the movie name, you could search backwards like this:
string productionYear =
Regex.Match(str, @"\b(19\d\d)|(20\d\d)\b", RegexOptions.RightToLeft);
If instead of 720p
we had a resolution of 2048p
for instance, this would not be a problem, because the 2nd \b
requires the number to be at the word end.
If the production year was always the 4th item from the right, then a better way to get this year would be:
string[] parts = str.Split('.');
string productionYear = parts[^4]; // C# 8.0 , .NET Core
// or
string productionYear = parts[parts.Length - 4]; // C# < 8 or .NET Framework
Note that the regex expression you specify in Regex.Split designates the separators, not the returned values.
CodePudding user response:
I would not try to split the string, more like match a field. Also, consider matching \d{4} and not \d if you want to be sure to get years and not other fields like resolution in your example
CodePudding user response:
You can try this:
var regex = new Regex(@"\b\d{4}\b");
var myInput = "Kingdom.of.Heaven.2005.720p.Dubbed.Film2media";
var productionYear = regex.Matches(myInput).Single().Value;
Console.WriteLine($"Production year: {productionYear}");
Demo: https://dotnetfiddle.net/KM2PNk
Output:
Production year: 2005