Home > Net >  Golang table webscraping
Golang table webscraping

Time:06-17

I have a code as below to scrape the specific cell value from html table. You can go to https://www.haremaltin.com/altin-fiyatlari website and search "satis__ATA_ESKI" on inspect mode to see that value. I am beginner on golang and did my best but unfortunately I couldn't get that value. Is there anybody to help me? Btw they don't have a community api. And one more thing, add time.sleep to wait for page to be loaded. If it returns "-" it is because page wasn't be loaded yet

package main

import (
"fmt"
"log"
"net/http"

"github.com/PuerkitoBio/goquery"
)

func main() {
   url := "https://www.haremaltin.com/altin-fiyatlari"

   resp, err := http.Get(url)
   if err != nil {
       log.Fatal(err)
   }
   defer resp.Body.Close()
   if resp.StatusCode != 200 {
       log.Fatalf("failed to fetch data: %d %s", resp.StatusCode, resp.Status)
   }

   doc, err := goquery.NewDocumentFromReader(resp.Body)
   if err != nil {
      log.Fatal(err)
   }

   doc.Find("tr__ATA_ESKI tr").Each(func(j int, tr *goquery.Selection) {
      data := []string{}
      tr.Find("td").Each(func(ix int, td *goquery.Selection) {
           e := td.Text()
           data = append(data, e)
           fmt.Println(data)
      })
   })
}

CodePudding user response:

Since the table is powered by javascript, i would suggest you use a different approach. Here's why.

What you're really scraping is

curl https://www.haremaltin.com/altin-fiyatlari > out.html

this web page. You can run this curl in a terminal and get the exact same reply as go's rest request ( exact is a strong word, most of the time, for sure this case )

As you can see no values are present in that out.html file you created, thats why your go script isn't returning any values.

You need to have javascript running to populate the page, so you can then scrape it.

I've used this https://github.com/chromedp/chromedp in a couple projects with great success. By using this tool your workflow will look something like..

  1. open headless browser
  2. go to url
  3. dump pages html
  4. parse with goquery
  5. print your response

CodePudding user response:

You can fetch via http Post request. Do not forget to add X-Requested-With header to request.

func fecthData() (map[string]interface{}, error) {
    body := strings.NewReader("dil_kodu=tr")
    req, err := http.NewRequest("POST", "https://www.haremaltin.com/dashboard/ajax/doviz", body)
    if err != nil {
        // handle err
        return nil, err
    }
    req.Header.Set("X-Requested-With", "XMLHttpRequest")

    resp, err := http.DefaultClient.Do(req)
    if err != nil {
        // handle err
        return nil, err
    }
    defer resp.Body.Close()
    jsonData, err := ioutil.ReadAll(resp.Body)
    if err != nil {
        panic(err)
        return nil, err
    }
    var data map[string]interface{}
    err = json.Unmarshal(jsonData, &data)
    if err != nil {
        return nil, err
    }
    return data, nil
}
  • Related