Home > Software engineering >  How to extract specific info under a specific class?
How to extract specific info under a specific class?

Time:08-04

Having an html string like below,

...
<ul  ...>
    ... <!-- some "<li> <a href.. > </a> </li>" here -->
</ul>

<ul  ...>
    <li>
        <a href="/some-text-1">some text 1</a>
    </li>
    <li>
        <a href="/some-text-2">some text 2</a>
    </li>
    <li>
        <a href="/some-text-3">some text 3</a>
    </li>
    ...
</ul>

<ul  ...>
    ... <!-- some "<li> <a href.. > </a> </li>" here -->
</ul>

I can extract the class named yes-this-class using the following code:

HtmlAgilityPack.HtmlDocument document = new HtmlAgilityPack.HtmlDocument();
document.LoadHtml(responseString.ToString());
HtmlNode htmlNode =
    document.DocumentNode.SelectSingleNode("//*[@class='yes-this-class']");

then, I do some string manipulation (regular expression) to extract the text below:

  • some text 1
  • some text 2
  • some text 3

How can I extract the result just above, using only HtmlAgilityPack and without using regular expression? I tried something like below but it didn't work.

HtmlNodeCollection htmlNodes =
    document.DocumentNode.SelectNodes("//*[@class='yes-this-class']/[@a='href']");

CodePudding user response:

Use the following XPath query:

//ul[@class='yes-this-class']/li//text()

Then run Trim() on each result to remove any leading and trailing whitespace around the string.

  • Related