I'm having a hard time trying to compose a RegEx to meet my specific requirements.
These are:
- Match keyword and capture the date that follows
- If keyword is not present capture nothing
- If keyword is present more than once, capture nothing
Keyword:
LT circa
Example Text:
Metall-Notierung 464,95 EUR 100 KG
* LT circa 21.04.2020 2 x 500 M Einwegtrommel 400x 150x 404mm
* LT circa 17.05.2020 2 x 500 M Einwegtrommel 400x 150x 404mm
Zolltarifnummer 80464995
Expected Result:
NULL
Example Text:
Metall-Notierung 464,95 EUR 100 KG
* LT circa 17.05.2020 2 x 500 M Einwegtrommel 400x 150x 404mm
Zolltarifnummer 80464995
Expected Result:
17.05.2020
Beeing a Newbie to RegEx these are the things I have tried so far on a simplified subject:
This test is a test and nothing else
(.*test.*test.*)?(?(1)(a^):(test.*))
...as you might expect, it would be naive to think that this could work.
Experts anyone?
Edit:
I checked using .NET Framework 4.7.2 and NUnit
using NUnit.Framework;
using System.Collections;
using System.Text;
using System.Text.RegularExpressions;
namespace Test.RegExpressions.Tests
{
[TestFixture]
public class SpecialRegexTests
{
[TestCaseSource(typeof(TestCaseClass), nameof(TestCaseClass.TestCases))]
public int MatchTest(string input, string pattern, RegexOptions regexOptions)
{
return new Regex(pattern, regexOptions).Matches(input).Count;
}
}
public static class TestCaseClass
{
private static readonly string S0 = new StringBuilder()
.AppendLine("Metall - Notierung 464,95 EUR 100 KG")
.AppendLine("* LT circa 21.04.2020 2 x 500 M Einwegtrommel 400x 150x 404mm")
.AppendLine("* LT circa 17.05.2020 2 x 500 M Einwegtrommel 400x 150x 404mm")
.AppendLine("Zolltarifnummer 80464995")
.ToString();
private static readonly string S1 = new StringBuilder()
.AppendLine("Metall - Notierung 464,95 EUR 100 KG")
.AppendLine("* LxT circa 21.04.2020 2 x 500 M Einwegtrommel 400x 150x 404mm")
.AppendLine("* LT circa 17.05.2020 2 x 500 M Einwegtrommel 400x 150x 404mm")
.AppendLine("Zolltarifnummer 80464995")
.ToString();
private const string R0 = @"^(?:(?!.*LT circa). \n)*(?:(?!LT circa).)*LT circa\s (\d\d\.\d\d.\d{4})(?!(?:. \n)*.*LT circa)";
private const string R1 = @"(?s)^(?!(?:.*LT circa){2}).*LT circa\s*\K\d{1,2}\.\d{1,2}\.\d{4}";
private const string R2 = @"(?s)^(?!(?:.*LT circa){2}).*LT circa\s*(\d{1,2}\.\d{1,2}\.\d{4})";
public static IEnumerable TestCases
{
get
{
yield return new TestCaseData(S0, R0, RegexOptions.None).Returns(0);
yield return new TestCaseData(S1, R0, RegexOptions.None).Returns(1);
yield return new TestCaseData(S0, R1, RegexOptions.None).Returns(0);
yield return new TestCaseData(S1, R1, RegexOptions.None).Returns(1);
yield return new TestCaseData(S0, R2, RegexOptions.None).Returns(0);
yield return new TestCaseData(S1, R2, RegexOptions.None).Returns(1);
}
}
}
}
Except for R1
which uses the \K
all of them pass the test.
I will update my question as soon as I have more info on the Regex Flavor in use.
Worth to mention, that none of these worked in the Software, which may or may not be a matter of RegEx options I don't have control over.
CodePudding user response:
You may try this regex with negative look-aheads. It is slightly longer but will be more efficient than using DOTALL
mode:
^(?:(?!.*LT circa). \n)*(?:(?!LT circa).)*LT circa\s (\d\d\.\d\d.\d{4})(?!(?:. \n)*.*LT circa)
CodePudding user response:
You can use
(?s)^(?!(?:.*LT circa){2}).*LT circa\s*\K\d{1,2}\.\d{1,2}\.\d{4}
See the regex demo. The date regex can be enhanced, but the main point is the pattern around it.
Details:
(?s)
-s
flag making.
match any characters^
- start of string(?!(?:.*LT circa){2})
- fail the match if there are two occurrences ofLT circa
anywhere in the string.*
- any zero or more chars as many as possibleLT circa
- the keyword\s*
- zero or more whitespaces\K
- mathc reset operator discarding all text matched so far\d{1,2}\.\d{1,2}\.\d{4}
- date like pattern.(?:0?[1-9]|[12]\d|3[01])\.(?:0?[1-9]|1[0-2])\.\d{4}(?!\d)
can be a bit more precise pattern for an arbitrary dd/MM/yyyy date (without leap year support).