Is there any way to make Search and addToSearch faster? I am trying to make it faster. I am not sure if regex in addtosearch can be a problem, it is really small. I am out ofideas how to optimize it further. Now i am just trying to meet word count. I wonder if there is a way to concatenate parts of name that are not empty more effectivly than i do.
using System.Collections.Generic;
using System.Text.RegularExpressions;
using System;
namespace AutoComplete
{
public struct FullName
{
public string Name;
public string Surname;
public string Patronymic;
}
public class AutoCompleter
{
private List<string> listOfNames = new List<string>();
private static readonly Regex sWhitespace = new Regex(@"\s ");
public void AddToSearch(List<FullName> fullNames)
{
foreach (FullName i in fullNames)
{
string nameToAdd = "";
if (!string.IsNullOrWhiteSpace(i.Surname))
{
nameToAdd = sWhitespace.Replace(i.Surname, "") " ";
}
if (!string.IsNullOrWhiteSpace(i.Name))
{
nameToAdd = sWhitespace.Replace(i.Name, "") " ";
}
if (!string.IsNullOrWhiteSpace(i.Patronymic))
{
nameToAdd = sWhitespace.Replace(i.Patronymic, "") " ";
}
listOfNames.Add(nameToAdd.Substring(0, nameToAdd.Length - 1));
}
}
public List<string> Search(string prefix)
{
if (prefix.Length > 100 || string.IsNullOrWhiteSpace(prefix))
{
throw new System.Exception();
}
List<string> namesWithPrefix = new List<string>();
foreach (string name in listOfNames)
{
if (IsPrefix(prefix, name))
{
namesWithPrefix.Add(name);
}
}
return namesWithPrefix;
}
private bool IsPrefix(string prefix, string stringToSearch)
{
if (stringToSearch.Length < prefix.Length)
{
return false;
}
for (int i = 0; i < prefix.Length; i )
{
if (prefix[i] != stringToSearch[i])
{
return false
}
}
return true
}
}
}
CodePudding user response:
Regular expression (Regexp) are great because of their ease-of use and flexibility but most Regexp engines are actually quite slow. This is the case for the one of C#. Moreover, strings can contain Unicode character and "\s" needs to consider all the (fancy) spaces characters included in the Unicode character set. This make Regexp search/replace much slower. If you know your input does not contain such characters (eg. ASCII), then you can write a much faster implementation. Alternatively, you can play with RegexpOptions
like Compiled
and CultureInvariant
so to reduce a bit the run time.
The AddToSearch
performs many hidden allocations. Indeed, =
create a new string (because C# string are immutable and not designed to be often resized) and Replace
calls does allocate new strings too. You can speed up the computation by directly replace and write the result in a preallocated buffer and simply copy the result with a Substring
like you currently do.
Search
is fine and it is not easy to optimize it. That being said, if listOfNames
is big, then you can use multiple threads so to significantly speed up the computation. Be careful though because Add
is not thread-safe. Parallel linkq may help you to do that easily (I never tested it though).
Another solution to speed up a bit the computation of Search
is to start the loop of IsPrefix
from prefix.Length-1
. Indeed, if most string contains the beginning of the prefix, then a significant portion of the time will be spend comparing nearly equal characters. The probability that prefix[prefix.Length-1] != stringToSearch[prefix.Length-1]
is higher than prefix[0] != stringToSearch[0]
. Additionally, partial loop unrolling may help a bit to speed up the function if the JIT is not able to do that.
CodePudding user response:
Others have already pointed out that the use of regex can be problematic. I would personally consider using str.Replace(" ", String.Empty) - if I understood the regex correctly; I normally try to avoid regex as I have a hard time reading code using regex. Note that String.Empty does not allocate a new string.
That said, I think performance could boost if you would not store the names in a List but at least order the list alpabetically. Thus you do not need to iterate all elemnts of the list but e.g. use binary search to find all elements matching a given prefix - as range within the list of names you already have.