I want to remove all elements, including the ones with attributes like class
, from my string.
I already checked here, so regex is apparently not the answer:
CodePudding user response:
This can be achieved without regular expressions by using a WebBrowser control. Try the following:
ExtractDesiredData:
Private Function ExtractDesiredData(html As String) As List(Of String)
Dim result As List(Of String) = New List(Of String)()
'create new instance
Using wb As WebBrowser = New WebBrowser()
wb.Navigate(New Uri("about:blank"))
'create reference
Dim doc As HtmlDocument = wb.Document
'add html to document
doc.Write(html)
'loop through body elements
For Each elem As HtmlElement In doc.Body.All
If elem.TagName = "DIV" AndAlso Not elem.InnerHtml.Contains("DIV") Then
Debug.WriteLine($"DIV elem InnerHtml: '{elem.InnerHtml}'")
'add
result.Add(elem.InnerHtml)
End If
Next
End Using
Return result
End Function
Usage:
Dim html As String = "<div fade-content""><div><span>some content</span></div></div>"
html &= vbCrLf & "<div>some content</div>"
Dim desiredData As List(Of String) = ExtractDesiredData(html)
Resources: