I need to remove words from the text with separators next to them. The problem is that the program only removes 1 separator after the word but there are many of them. Any suggestions how to remove other separators? Also, I need to make sure that the word is not connected with other letters. For example (If the word is fHouse or Housef it should not be removed)
At the moment I have:
public static void Process(string fin, string fout)
{
using (var foutv = File.CreateText(fout)) //fout - OutPut.txt
{
using (StreamReader reader = new StreamReader(fin)) // fin - InPut.txt
{
string line;
while ((line = reader.ReadLine()) != null)
{
string[] WordsToRemove = { "Home", "House", "Room" };
char[] seperators = {';', ' ', '.', ',', '!', '?', ':'};
foreach(string word in WordsToRemove)
{
foreach (char seperator in seperators)
{
line = line.Replace(word seperator, string.Empty);
}
}
foutv.WriteLine(line);
}
}
}
}
I have :
fhgkHouse!House!Dog;;;!!Inside!C!Room!Home!House!Room;;;;;;;;;;!Table!London!Computer!Room;..;
Results I get:
fhgkDog;;;!!Inside!C!;;;;;;;;;!Table!London!Computer!..;
The results should be:
fhgkHouse!Dog;;;!!Inside!C!Table!London!Computer!
CodePudding user response:
Try this regex : \b(Home|House|Room)(!|;)*\b|; \.\.;
See at: https://regex101.com/r/LUsyM8/1
In there, I substitute words and special characters with blank or empty string.
It produces the same expected result I guess.