I'm making a web scraper in go. Given a specific web page, I'm trying to get the name of the seller which is placed in the top right corner (in this example on this olx site you can see the name of the seller is Ionut). When I run the down below code, it should write the name in the index.csv file, but the file is empty. I think the problem is at the HTML parser, though it looks fine to me.
package main
import (
"encoding/csv"
"fmt"
"log"
"os"
"path/filepath"
"github.com/gocolly/colly"
)
func main() {
//setting up the file where we store collected data
fName := filepath.Join("D:\\", "go projects", "cwst go", "CWST-GO", "target folder", "index.csv")
file, err := os.Create(fName)
if err != nil {
log.Fatalf("Could not create file, error :%q", err)
}
defer file.Close()
//writer that writes the collected data into our file
writer := csv.NewWriter(file)
//after the file is written, what it is in the buffer goes in writer and then passed to file
defer writer.Flush()
//collector
c := colly.NewCollector(
colly.AllowedDomains("https://www.olx.ro/"),
)
//HTML parser
c.OnHTML(".css-1fp4ipz", func(e *colly.HTMLElement) { //div class that contains wanted info
writer.Write([]string{
e.ChildText("h4"), //specific tag of the info
})
})
fmt.Printf("Scraping page : ")
c.Visit("https://www.olx.ro/d/oferta/bmw-xdrixe-seria-7-2020-71000-tva-IDgp7iN.html")
log.Printf("\n\nScraping Complete\n\n")
log.Println(c)
}
CodePudding user response:
You don't need to add https
or /
in the allowed domains.
c := colly.NewCollector(
colly.AllowedDomains("www.olx.ro"),
)