Home > Software engineering >  Removing words from text with separators in front(using Regex or List)
Removing words from text with separators in front(using Regex or List)

Time:11-22

I need to remove words from the text with separators next to them. The problem is that the program only removes 1 separator after the word but there are many of them. Any suggestions how to remove other separators? Also, I need to make sure that the word is not connected with other letters. For example (If the word is fHouse or Housef it should not be removed)

At the moment I have:

public static void Process(string fin, string fout)
{
    using (var foutv = File.CreateText(fout)) //fout - OutPut.txt
    {
        using (StreamReader reader = new StreamReader(fin)) // fin - InPut.txt
        {
            string line;
            while ((line = reader.ReadLine()) != null)
            {
                string[] WordsToRemove = { "Home", "House", "Room" };
                char[] seperators = {';', ' ', '.', ',', '!', '?', ':'};
                foreach(string word in WordsToRemove)
                {
                    foreach (char seperator in seperators)
                    {
                        line = line.Replace(word   seperator, string.Empty);
                    }
                }
                foutv.WriteLine(line);
            }
        }
    }
}

I have :

fhgkHouse!House!Dog;;;!!Inside!C!Room!Home!House!Room;;;;;;;;;;!Table!London!Computer!Room;..;

Results I get:

  fhgkDog;;;!!Inside!C!;;;;;;;;;!Table!London!Computer!..;

The results should be:

fhgkHouse!Dog;;;!!Inside!C!Table!London!Computer!

CodePudding user response:

Try this regex : \b(Home|House|Room)(!|;)*\b|; \.\.;

See at: https://regex101.com/r/LUsyM8/1

In there, I substitute words and special characters with blank or empty string.

It produces the same expected result I guess.

  •  Tags:  
  • c#
  • Related