Home > Mobile >  Faster method to remove non-letter characters from string
Faster method to remove non-letter characters from string

Time:08-13

I want to remove all characters from string except Unicode letters.

I consider using this code:

public static string OnlyLetters(string text)
{
    return new string (text.Where(c => Char.IsLetter(c)).ToArray());
}

But maybe Regex will be faster?

public static string OnlyLetters(string text)
{
    Regex rgx = new Regex("[^\p{L}]");
    return rgx.Replace(text, "");
}

Could you verify these codes and suggest which one should I choose?

CodePudding user response:

If you want to know which horse is faster, you can perform races:

Often, manual manipulations appear to be fast, let's try this approach:

private static string ManualReplace(string value)
{
  // let's allocate memory only once - value.Length characters
  StringBuilder sb = new StringBuilder(value.Length);

  foreach (char c in value)
    if (char.IsLetter(c))
      sb.Append(c);

  return sb.ToString();
}

Races:

// 123 - seed - in order text to be the same
Random random = new Random(123);

// let's compile the regex
Regex rgx = new Regex(@"[^\p{L}]", RegexOptions.Compiled);
string result = null; // <- makes compiler to be happy

string text = string.Concat(Enumerable
                            .Range(1, 10_000_000)
                            .Select(_ => (char)random.Next(32, 128)));

Stopwatch sw = new Stopwatch();

// warming: let .net compile IL, fill caches, allocate memory etc.
int warming = 5;

for (int i = 0; i < warming;   i)
{
  if (i == warming - 1)
    sw.Start(); 

  // result = new string(text.Where(c => char.IsLetter(c)).ToArray());

  result = rgx.Replace(text, "");

  // result = string.Concat(text.Where(c => char.IsLetter(c)));

  // result = ManualReplace(text);

  if (i == warming - 1)
    sw.Stop();
}

Console.WriteLine($"{sw.ElapsedMilliseconds}");

Run this several time and you'll get the results. Mine (.net 6, Release) are

new string    : 120 ms
rgx.Replace   : 350 ms
string.Concat : 150 ms
Manual        :  80 ms

So we have the winner, it's Manual replace; among the others new string (text.Where(c => Char.IsLetter(c)).ToArray()); is the fastest, string.Concat is slightly slower and Regex.Replace is a loser.

  • Related