I want to remove all characters from string except Unicode letters.
I consider using this code:
public static string OnlyLetters(string text)
{
return new string (text.Where(c => Char.IsLetter(c)).ToArray());
}
But maybe Regex
will be faster?
public static string OnlyLetters(string text)
{
Regex rgx = new Regex("[^\p{L}]");
return rgx.Replace(text, "");
}
Could you verify these codes and suggest which one should I choose?
CodePudding user response:
If you want to know which horse is faster, you can perform races:
Often, manual manipulations appear to be fast, let's try this approach:
private static string ManualReplace(string value)
{
// let's allocate memory only once - value.Length characters
StringBuilder sb = new StringBuilder(value.Length);
foreach (char c in value)
if (char.IsLetter(c))
sb.Append(c);
return sb.ToString();
}
Races:
// 123 - seed - in order text to be the same
Random random = new Random(123);
// let's compile the regex
Regex rgx = new Regex(@"[^\p{L}]", RegexOptions.Compiled);
string result = null; // <- makes compiler to be happy
string text = string.Concat(Enumerable
.Range(1, 10_000_000)
.Select(_ => (char)random.Next(32, 128)));
Stopwatch sw = new Stopwatch();
// warming: let .net compile IL, fill caches, allocate memory etc.
int warming = 5;
for (int i = 0; i < warming; i)
{
if (i == warming - 1)
sw.Start();
// result = new string(text.Where(c => char.IsLetter(c)).ToArray());
result = rgx.Replace(text, "");
// result = string.Concat(text.Where(c => char.IsLetter(c)));
// result = ManualReplace(text);
if (i == warming - 1)
sw.Stop();
}
Console.WriteLine($"{sw.ElapsedMilliseconds}");
Run this several time and you'll get the results. Mine (.net 6, Release) are
new string : 120 ms
rgx.Replace : 350 ms
string.Concat : 150 ms
Manual : 80 ms
So we have the winner, it's Manual
replace; among the others new string (text.Where(c => Char.IsLetter(c)).ToArray());
is the fastest, string.Concat
is slightly slower and Regex.Replace
is a loser.