Home > OS >  remove "ال" from all string in arab word
remove "ال" from all string in arab word

Time:11-15

I'm trying to remove "ال" from every arabic string thats contains "ال"

I'm trying to do this by using this code but its only delete "ال" from the first word:

input      : الغيث الغيث الغيث
output     : غيث الغيث الغيث
what i need: غيث غيث غيث
string[] prefixes = { "ال", "اَلْ", "الْ", "اَل" };
 
foreach (string prefix in prefixes)
{
    if (text.StartsWith(prefix))
    {
        text = text.Substring(prefix.Length);
        break;
    }

CodePudding user response:

If you are going to work with words not just Replace every occurrence, you may want regular expression to match words, e.g.

using System.Text.RegularExpressions;

...

string input = "الغيث الغيث الغيث";
string[] prefixes = { "ال", "اَلْ", "الْ", "اَل" };

// \b - word boundary - we are looking for prefixes only
string output = Regex.Replace(input, @$"\b({string.Join("|", prefixes)})", "");

Let's have a look:

Console.Write(string.Join(Environment.NewLine, input, output));

Output:

الغيث الغيث الغيث
غيث غيث غيث

CodePudding user response:

Try this regex:

\b\u0627(?:\u0644\u0652?|\u064e\u0644\u0652?)

See regex demo.

And this is the C# code that does what you want:

using System;
using System.Text.RegularExpressions;

public class Example
{
   public static void Main()
   {
      string input = @"الغيث الغيث الغيث الغيث

اَلغيث اَلغيث اَلغيث اَلغيث

اَلْغيث اَلْغيث اَلْغيث اَلْغيث

الْغيث الْغيث الْغيث الْغيث
";

      string pattern = @"\b\u0627(?:\u0644\u0652?|\u064e\u0644\u0652?)";
      string replacement = "";
      string result = Regex.Replace(input, pattern, replacement);
      
      Console.WriteLine("Original String: {0}", input);
      Console.WriteLine("\n\n-----------------\n\n");
      Console.WriteLine("Replacement String: {0}", result);                             
   }
}

See C# code demo.

  • Related