I have the following code that grabs the nodes with text for certain descendants of specific tags/classes, and it was working before, but I haven't ran this program in a couple of months (nobody else has touched it) so I'm wondering why it's throwing an error now. My nodeList
looks like this:
var nodesList = doc.DocumentNode
.SelectNodes("//article[@class='article-content']//div[@class='article-content-block']//text()[not(parent::script)]")
.Select(node => node.InnerText).ToList();
I look at the web page, and there are multiple paragraph and ul tags that fit that particular Xpath
query, but nodesList
is returning:
System.ArgumentNullException: 'Value cannot be null. (Parameter 'source')'
The DocumentNode
has name: #document
, which I would expect is normal and the InnerHtml
is showing the entirety of the page's HTML however the InnerText
is showing Javascript must be enabled for the correct page display
. Any ideas as to why it would be throwing null? I don't recall seeing the Javascript must be enabled for the correct page display
before for the DocumentNode
's InnerText
, so I'm wondering if that has something to do with it.
CodePudding user response:
It sounds like the webpage content is being loaded dynamically. That's not a problem for your browser, because it executes Javascript automatically, but the .NET web components don't do any of that. You should be able to use your browser's dev tools to determine which request actually contains the content you're looking for, and then replicate that request in your code.
It could also be that something else about your request isn't playing nice with the server - missing/bad HTTP headers, unexpected TLS version, maybe even firewall stuff - causing it to return a different response.
CodePudding user response:
Can you try getting the text in a string and do a .lenght or
while(true)
{
if(phrase == null){break;}
x
}