Retriving set of XML nodes from a plain text file-CodePudding

I have a plain text file as below,

<body labelR={Right} LabelL={Left}> </body/> Video provides a powerful way to help you prove your point. When you click Online Video, you can paste in the embed code for the video you want to add. You can also type a keyword to search online for the video that best fits your document. <body TestR={TestRight} TestL={TestLeft}> </body/>

It is read into the file system as,

var plainText = File.ReadAllText(@"D:\TestTxt.txt");

I'm trying to figure out a way if there is a way to filer out and get a list of a particular set of elements which are in XML syntax. Desired outcome is as below,

A list of 2 items in this case, with,

<body labelR={Right} LabelL={Left}>
</body/>
<body TestR={TestRight} TestL={TestLeft}>
</body/>

Basically the XML elements with <body> </body>

I cannot use LINQ to XML here since this plain text content is not valid XML syntax, I have read that RegEx might be possible but I'm not sure the proper way to use it here.

Any advise is greatly appreciated here

CodePudding user response：

I think the best way to implement this situation is to change your txt file to an XML file by adding a little piece of code, Then you can easily read it

this and this will help you to do that.

using (XmlReader reader = XmlReader.Create(@"YOUFILEPATH.xml"))  
            {  
                while (reader.Read())  
                {  
                    if (reader.IsStartElement())  
                    {  
                        //return only when you have START tag  
                        switch (reader.Name.ToString())  
                        {  
                            case "Key":  
                                Console.WriteLine("Element  tag name is: "   reader.ReadString());  
                                break;  
                            case "Element value is: "
                                Console.WriteLine("Your Location is : "   reader.ReadString());  
                                break;  
                        }  
                    }  
                    Console.WriteLine("");  
                }

CodePudding user response：

A plain string-based solution could be:

var s = "<body labelR={Right} LabelL={Left}> </body/> Video provides ... your document. <body TestR={TestRight} TestL={TestLeft}> </body/>";
int start = 0;
while ((start = s.IndexOf("<body", start )) >= 0)
{
    var end = s.IndexOf("</body/>", start   "<body".Length)   "</body/>".Length;
    Console.WriteLine(s[start..end]);
    start = end;
}

This finds the next <body starting from the previous "node" (if any). Then it finds the (end of the) next </body/>. Finally it prints the substring.

Repeat until no start marker was found, so it prints:

<body labelR={Right} LabelL={Left}> </body/>
<body TestR={TestRight} TestL={TestLeft}> </body/>

You may want to add some checks - what if the end marker is missing?