Can the go-colly library crawl all HTML tags and text content under a div tag? If so, how? I can get all texts under a div tag. Like this:
c.OnHTML("body .post-topic-main .post-topic-des", func(e *colly.HTMLElement) {
text = strings.TrimSpace(e.Text)
})
But I dont'know how to get HTML tags under the div tag.
CodePudding user response:
If you looking for innerHTML
it is accessible by DOM
and using Html
method (e.DOM.Html()
).
c.OnHTML("body .post-topic-main .post-topic-des", func(e *colly.HTMLElement) {
html, _ := e.DOM.Html()
log.Println(html)
})
If you looking for a special tag under the founded element, ForEach
could use for this purpose. The first argument is the selector and the second parameter is the callback function. The callback function will iterate for each element that matches the selector and also is a member of the e
element.
More information: https://pkg.go.dev/github.com/gocolly/[email protected]#HTMLElement.ForEach
c.OnHTML("body .post-topic-main .post-topic-des", func(e *colly.HTMLElement) {
text := strings.TrimSpace(e.Text)
log.Println(text)
e.ForEach("div", func(_ int, el *colly.HTMLElement) {
text := strings.TrimSpace(e.Text)
log.Println(text)
})
})