Home > Enterprise >  How to get inner HTML, or just text, of a tag?
How to get inner HTML, or just text, of a tag?

Time:10-03

How do we get the value of anchor text per the example below? Here is my go code. I can get the value of href and title using html.ElementNode. I need to get the value of text using only golang.org/x/net/html, with no other libraries.

Example: From <a href="https:xyz.com">Text XYZ</a>, I want to get "Text XYZ".

// html.ElementNode works for getting href and title value but no text value with TextNode. 
if n.Type == html.TextNode && n.Data == "a" {
    for _, a := range n.Attr {
        if a.Key == "href" {
            text = a.Val
        }
    }
}

CodePudding user response:

Given the HTML:

<a href="http://example.com/1">Go to <b>example</b> 1</a>
<p>Some para text</p>
<a href="http://example.com/2">Go to <b>example</b> 2</a>

Do you expect just the text?

Go to example 1
Go to example 2

Do you expect the inner HTML?

Go to <b>example</b>example 1
Go to <b>example</b>example 2

Or, do you expect something else?

The following program gives either just the text or the inner HTML. Every time it finds an anchor node, it saves that node, then continues down that node’s tree. As it encounters other nodes it checks against the saved node and either appends the text of TextNodes or renders the node's HTML to a buffer. Finally, after traversing all the children and re-encountering the saved anchor node, it prints the text string and the HTML buffer, resets both vars, then nils the anchor node.

I got the idea of using a buffer and html.Render, and saving a particular node, from Golang parse HTML, extract all content with tags.

The following is also in the Playground:

package main

import (
    "bytes"
    "fmt"
    "io"
    "strings"

    "golang.org/x/net/html"
)

func main() {
    s := `
    <a href="http://example.com/1">Go to <b>example</b> 1</a>
    <p>Some para text</p>
    <a href="http://example.com/2">Go to <b>example</b> 2</a>
    `

    doc, _ := html.Parse(strings.NewReader(s))

    var nAnchor *html.Node
    var sTxt string
    var bufInnerHtml bytes.Buffer

    w := io.Writer(&bufInnerHtml)

    var f func(*html.Node)
    f = func(n *html.Node) {
        if n.Type == html.ElementNode && n.Data == "a" {
            nAnchor = n
        }

        if nAnchor != nil {
            if n != nAnchor { // don't write the a tag and its attributes
                html.Render(w, n)
            }
            if n.Type == html.TextNode {
                sTxt  = n.Data
            }
        }

        for c := n.FirstChild; c != nil; c = c.NextSibling {
            f(c)
        }

        if n == nAnchor {
            fmt.Println("Text:", sTxt)
            fmt.Println("InnerHTML:", bufInnerHtml.String())
            sTxt = ""
            bufInnerHtml.Reset()
            nAnchor = nil
        }
    }
    f(doc)
}
Text: Go to example 1
InnerHTML: Go to <b>example</b>example 1
Text: Go to example 2
InnerHTML: Go to <b>example</b>example 2
  • Related