I have the following method to clean up strings:
public static String UseStringBuilderWithHashSet(string strIn)
{
var hashSet = new HashSet<char>("?&^$#@!() -,:;<>’\'-_*");
// specify capacity of StringBuilder to avoid resizing
StringBuilder sb = new StringBuilder(strIn.Length);
foreach (char x in strIn.Where(c => !hashSet.Contains(c)))
{
sb.Append(x);
}
return sb.ToString();
}
However, strings such as [MV] REOL ちるちる ChiruChiru
or [MV] REOL ヒビカセ Hibikase
do not get cleaned up.
How can I modify my method so it can turn one of the above strings into for example:
[MV] REOL ChiruChiru
CodePudding user response:
You're trying to solve this exhaustively by filtering out everything you don't want. This is not optimal as their are 100,000 possible characters.
You may find better results if you only accept what you do want.
public static string CleanInput(string input)
{
//a-zA-Z allows any English alphabet character upper or lower case
//\[ and \] allows []
//\s allows whitespace
var regex = new Regex(@"[a-zA-Z\[\]\s]");
var stringBuilder = new StringBuilder(input.Length);
foreach(char c in input){
if(regex.IsMatch(c.ToString())){
stringBuilder.Append(c);
}
}
string output = stringBuilder.ToString();
//\s will match on any duplicate spaces and replace it with
//a single space.
return Regex.Replace(output , @"\s ", " ");
}