Home > Blockchain >  Remove all div elements from string using vb.net
Remove all div elements from string using vb.net

Time:04-04

I want to remove all elements, including the ones with attributes like class, from my string. I already checked here, so regex is apparently not the answer: enter image description here

CodePudding user response:

This can be achieved without regular expressions by using a WebBrowser control. Try the following:

ExtractDesiredData:

Private Function ExtractDesiredData(html As String) As List(Of String)
    Dim result As List(Of String) = New List(Of String)()

    'create new instance
    Using wb As WebBrowser = New WebBrowser()
        wb.Navigate(New Uri("about:blank"))

        'create reference
        Dim doc As HtmlDocument = wb.Document

        'add html to document
        doc.Write(html)

        'loop through body elements
        For Each elem As HtmlElement In doc.Body.All
            If elem.TagName = "DIV" AndAlso Not elem.InnerHtml.Contains("DIV") Then
                Debug.WriteLine($"DIV elem InnerHtml: '{elem.InnerHtml}'")

                'add
                result.Add(elem.InnerHtml)
            End If
        Next
    End Using

    Return result
End Function

Usage:

Dim html As String = "<div fade-content""><div><span>some  content</span></div></div>"
html &= vbCrLf & "<div>some  content</div>"

Dim desiredData As List(Of String) = ExtractDesiredData(html)

Resources:

  • Related