Remove duplicate <div> and </div> tags from a given string-CodePudding

I have the following string with several useless and tags.

<div style="xpto">
   <div>
      <div>
         <div>
            <div style="xpto">
               <div>
                  <div>
                     <div style="xpto">
                        <div>
                           <div></div>
                        </div>
                     </div>
                  </div>
               </div>
            </div>
         </div>
      </div>
   </div>
</div>

How can I get into the following string, using c#, where there are no useless 'div' and it's correspondent '/div'

<div style="xpto">
   <div>
      <div style="xpto">
         <div>
            <div style="xpto">
               <div></div>
            </div>
         </div>
      </div>
   </div>
</div>

CodePudding user response：

Just solved this issue with the following recursive function.

private string RemoveDuplicateDivs(string texto)
{
    var indexOfStartDiv = texto.IndexOf("<div><div>");

    if (indexOfStartDiv > 0)
    {
        var indexOfEndDiv = texto.LastIndexOf("</div></div>");

        if (indexOfEndDiv > 0)
        {
            var result = texto.Remove(indexOfStartDiv, 5).Remove((indexOfEndDiv - 5), 6);
            return RemoveDuplicateDivs(result);
        }
    }

    return texto;
}

CodePudding user response：

To remove the duplicate <div> and </div> tags from a given string, you can use a regular expression to match the redundant tags and replace them with an empty string. Here's an example code in C# that demonstrates this approach:

using System;
using System.Text.RegularExpressions;
namespace RemoveDuplicateTags
{
class Program
{
    static void Main(string[] args)
    {
        string input = "<div style='xpto'>"  
                       "   <div>"  
                       "      <div>"  
                       "         <div>"  
                       "            <div style='xpto'>"  
                       "               <div>"  
                       "                  <div>"  
                       "                     <div style='xpto'>"  
                       "                        <div>"  
                       "                           <div></div>"  
                       "                        </div>"  
                       "                     </div>"  
                       "                  </div>"  
                       "               </div>"  
                       "            </div>"  
                       "         </div>"  
                       "      </div>"  
                       "   </div>"  
                       "</div>";

        string pattern = "(<div[^>]*>\\s*){2,}(\\s*</div>){2,}";
        string result = Regex.Replace(input, pattern, "<div$1$2");

        Console.WriteLine(result);
    }
 }
}

This code uses the Regex.Replace method to search for two or more consecutive <div> and </div> tags and replaces them with a single <div> and </div> tag. The result of this code will be the desired output string.