Home > Blockchain >  Regex for matching something preceded by chosen letters which is then replaced
Regex for matching something preceded by chosen letters which is then replaced

Time:10-11

I'm trying to figure out a regex expression to match and replace Yo/yo with Йо/йо or ЬО/ьо based on the rules below.

  • Replace Yo/yo (capital, non-capital letter) with Йо/йо if it is in the beginning of the word or is preceded by the letter: а, ъ, о, у, е, и.
  • If the condition above is not met, replace Yo/yo with ЬО/ьо.

I believe the following regex would work but there are two issues:

  • how do I make it work for capital (Yo/Йо/Ьо) and non-capital (yo/йо/ьо)?
  • if the conditions for Йо/йо are not met, how do I replace with ЬО/ьо instead?
(?<=а|ъ|о|у|е|и)йо
Regex.Replace(text, "(?<=а|ъ|о|у|е|и)йо", " "); // ??

Test cases

[Theory]
[InlineData("Асансyoр", "Асансьор")]
[InlineData("Актyoр", "Актьор")]
[InlineData("Шофyoр", "Шофьор")]
[InlineData("Пощалyoн", "Пощальон")]
[InlineData("Trenyor", "Треньор")]
[InlineData("Булyoн", "Бульон")]
[InlineData("Бокyoр", "Бокьор")]
[InlineData("Сервитyoр", "Сервитьор")]
[InlineData("Раyoн", "Район")]
[InlineData("Маyoнеза", "Майонеза")]
[InlineData("Маyoр", "Майор")]
[InlineData("Yoрдан", "Йордан")]
[InlineData("Yoвка", "Йовка")]
public void ShouldReturnReplacedWord_WhenGivenWord(string word, string expected)

CodePudding user response:

This works for me with xUnit in LinqPad 7:

#load "xunit"

void Main()
{
    RunTests();  // Call RunTests() or press Alt Shift T to initiate testing.
}

static readonly Regex _yoRegex = new Regex( @"(?<pre>[\bayаъоуеи]?)(?<yo>[Yy]o)(?<post>\w )?", RegexOptions.Compiled );

static String ReplaceYo( String input )
{
    if( _yoRegex.IsMatch( input ) )
    {
        String replaced = _yoRegex.Replace( input, YoMatchEvaluator );
        return replaced;
    }
    else
    {
        throw new InvalidOperationException( "Input did not match regex." );
//      return input;
    }
}

static String YoMatchEvaluator( Match match )
{
    String pre  = match.Groups["pre" ].Value;
    String yo   = match.Groups["yo"  ].Value;
    String post = match.Groups["post"].Value;
    
    Boolean isBeginningOfWord = ( pre.Length == 0 ) && ( match.Index == 0 );
    Boolean isPrecededByVowel = ( pre.Length == 1 ); // Note that `\b` will mean `pre.Length == 0`.
    Boolean isEndOfWord       = ( post.Length == 0 );
    
    if( isBeginningOfWord || isPrecededByVowel )
    {
        if( yo == "Yo" )
        {
            return pre   "Йо"   post;
        }
        else if( yo == "yo" )
        {
            return pre   "йо"   post;
        }
        else
        {
            throw new InvalidOperationException( "Unexpected \"Yo\" match: \"{0}\"".FmtInv( yo ) );
        }
    }
    else
    {
        if( yo == "Yo" )
        {
            return pre   "ЬО"   post;
        }
        else if( yo == "yo" )
        {
            return pre   "ьо"   post;
        }
        else
        {
            throw new InvalidOperationException( "Unexpected \"Yo\" match: \"{0}\"".FmtInv( yo ) );
        }
    }
}

static class MyExtensions
{
    public static String FmtInv( this String format, params Object?[]? args ) => String.Format( CultureInfo.InvariantCulture, format, args: args );
}

#region private::Tests

[Theory]
[InlineData(  1, "Асансyoр", "Асансьор")]
[InlineData(  2, "Актyoр", "Актьор")]
[InlineData(  3, "Шофyoр", "Шофьор")]
[InlineData(  4, "Пощалyoн", "Пощальон")]
//[InlineData( 5, "Trenyor", "Треньор")]
[InlineData(  5, "Trenyor", "Trenьоr")]
[InlineData(  6, "Булyoн", "Бульон")]
[InlineData(  7, "Бокyoр", "Бокьор")]
[InlineData(  8, "Сервитyoр", "Сервитьор")]
[InlineData(  9, "Раyoн", "Район")]
[InlineData( 10, "Маyoнеза", "Майонеза")]
[InlineData( 11, "Маyoр", "Майор")]
[InlineData( 12, "Yoрдан", "Йордан")]
[InlineData( 13, "Yoвка", "Йовка")]
[InlineData( 14, "Светлyo", "Светльо")]
public void ShouldReturnReplacedWord_WhenGivenWord( Int32 testCase, String word, String expected)
{
    String actual = ReplaceYo( word );
    Assert.Equal( expected: expected, actual: actual );
}

#endregion

Test results:

Input Expected Actual Result
"Асансyoр" "Асансьор" "Асансьор" Pass
"Актyoр" "Актьор" "Актьор" Pass
"Шофyoр" "Шофьор" "Шофьор" Pass
"Пощалyoн" "Пощальон" "Пощальон" Pass
"Trenyor" "Trenьоr" "Trenьоr" Pass
"Булyoн" "Бульон" "Бульон" Pass
"Бокyoр" "Бокьор" "Бокьор" Pass
"Сервитyoр" "Сервитьор" "Сервитьор" Pass
"Раyoн" "Район" "Район" Pass
"Маyoнеза" "Майонеза" "Майонеза" Pass
"Маyoр" "Майор" "Майор" Pass
"Yoрдан" "Йордан" "Йордан" Pass
"Yoвка" "Йовка" "Йовка" Pass
"Светлyo" "Светльо" "Светльо" Pass
  • Related