Home > Mobile >  I can't parse this html code using C# agility pack
I can't parse this html code using C# agility pack

Time:11-20

I've been trying to parse this code for a very long time:

<html>
<body class="detailpage">
    <div id="innerLayout">
        <section id="body-container">
            <div class="wrapper">
                <div class="content" id="offer_active">
                    <div class="clr offerbody">
                        <div class="offercontent fleft rel ">
                            <div class="offercontentinner">
                                <script>
                                    texto = {"name":"John"};
                                </script>
                            </div>
                        </div>
                    </div>
                </div>
            </div>
        </section>
        
    </div>
</body>
</html>

I prefer using AgilityPack, and I want to get "name" : "John" as a result, but I have not been successful.

This is my attempt:

string stringThatKeepsYourHtml = @"<!DOCTYPE html> <head> <title>Title</title> </head> <body> <div id=""myId"" myClass""> <div myClass"">hello</div> </div> </body> </html>"; 
HtmlDocument doc = new HtmlDocument();
doc.LoadHtml(stringThatKeepsYourHtml); 
string whatUrLookingFor = doc.DocumentNode.
    SelectNodes("//div").
    First().
    SelectNodes("//div").
    First().
    InnerText; 
Console.WriteLine(whatUrLookingFor); 
Console.ReadKey(true);

How can I get this working?

CodePudding user response:

Not sure what the problem with parsing it is.. This worked fine:


        var html = @"
<html>
<body class=""detailpage"">
    <div id=""innerLayout"">
        <section id=""body-container"">
            <div class=""wrapper"">
                <div class=""content"" id=""offer_active"">
                    <div class=""clr offerbody"">
                        <div class=""offercontent fleft rel "">
                            <div class=""offercontentinner"">
                                <script>
                                    texto = {""name"":""John""};
                                </script>
                            </div>
                        </div>
                    </div>
                </div>
            </div>
        </section>
        
    </div>
</body>
</html>";

        var htmlDoc = new HtmlDocument();
        htmlDoc.LoadHtml(html);

        string scr = htmlDoc.DocumentNode.SelectSingleNode("//script").InnerText;
        
        Console.WriteLine(scr);

scr contains the full script texto = {"name":"John"} - you can remove the texto = and then json parse the remainder, or just take everything between { and } using some substring, for example:

var openBra = scr.IndexOf('{');
var closeBra = scr.LastIndexOf('}');
var between = scr[openBra 1..closeBra]; //c# version 8 ranges feature, use Substring if you're on c# <8

I'm not really clear on what you wanted to do with it

https://dotnetfiddle.net/Uinjl6

  • Related