Home > Enterprise >  How to scrape data ecomerce web
How to scrape data ecomerce web

Time:04-20

I want get data ("discount") form url = https://www.lazada.sg/puma-singapore/?q=All-Products&from=wangpu&langFlag=en&pageTypeId=2

But not get

function myFunction() {
 const url = 'https://www.lazada.sg/puma-singapore/?q=All-Products&from=wangpu&langFlag=en&pageTypeId=2'
  // parse the data 
    

function getData(url) {
    const fromText = '<span  data-spm-anchor-id="a2o42.seller.list.i41.62ff63deVng91O">';
    const toText = '</span>';
    const content = UrlFetchApp.fetch(url).getContentText();
    const scraped = Parser
                    .data(content)
                    .setLog()     
                    .from(fromText)
                    .to(toText)
                    .build();
    return scraped;
}

  const discount = getData(url).replace("%", "").replace(/\-/g,"");
  Logger.log(discount)
}

CodePudding user response:

When I saw the HTML of the URL, it seems that the values are put using Javascript. But, fortunately, the values are included in the HTML as the JSON data. So, in this answer, I would like to propose retrieving the value by parsing the JSON data in HTML. The sample script is as follows.

Sample script:

Please set the item name you want to retrieve the value of discount.

function myFunction() {
  const itemName = "PUMA Unisex Deck Backpack II"; // Please set the item name.

  const url = 'https://www.lazada.sg/puma-singapore/?q=All-Products&from=wangpu&langFlag=en&pageTypeId=2'
  const content = UrlFetchApp.fetch(url).getContentText();
  const str = content.match(/window.pageData =([\w\s\S] ?});/);
  if (!str || str.length < 1) return;
  const obj = JSON.parse(str[1]);
  const items = obj.mods.listItems.filter(({ name }) => name == itemName);
  if (items.length == 0) return;
  const res = items.map(({ discount }) => discount);
  console.log(res)
}

Testing:

  • When this script is run, [ '-34%', '-34%' ] is obtained. Because there are 2 items of PUMA Unisex Deck Backpack II. So, the result has 2 values.

Note:

  • In the current stage, I can confirm that this script works. But, if the structure of HTML is changed in the future, this script might not be able to be used. Please be careful about this.

References:

CodePudding user response:

Thanks for the support

However I want to get all the data of all products

Is there any way?

  • Related