Having an html
string like below,
...
<ul ...>
... <!-- some "<li> <a href.. > </a> </li>" here -->
</ul>
<ul ...>
<li>
<a href="/some-text-1">some text 1</a>
</li>
<li>
<a href="/some-text-2">some text 2</a>
</li>
<li>
<a href="/some-text-3">some text 3</a>
</li>
...
</ul>
<ul ...>
... <!-- some "<li> <a href.. > </a> </li>" here -->
</ul>
I can extract the class named yes-this-class
using the following code:
HtmlAgilityPack.HtmlDocument document = new HtmlAgilityPack.HtmlDocument();
document.LoadHtml(responseString.ToString());
HtmlNode htmlNode =
document.DocumentNode.SelectSingleNode("//*[@class='yes-this-class']");
then, I do some string manipulation (regular expression) to extract the text below:
- some text 1
- some text 2
- some text 3
How can I extract the result just above, using only HtmlAgilityPack and without using regular expression? I tried something like below but it didn't work.
HtmlNodeCollection htmlNodes =
document.DocumentNode.SelectNodes("//*[@class='yes-this-class']/[@a='href']");
CodePudding user response:
Use the following XPath query:
//ul[@class='yes-this-class']/li//text()
Then run Trim()
on each result to remove any leading and trailing whitespace around the string.