Home > database >  Google Apps Script stopped scraping data from Yahoo Finance
Google Apps Script stopped scraping data from Yahoo Finance

Time:01-27

This Google Apps Script code to scrape historical data from from Yahoo Finance stopped working yesterday. It suddenly gives the error - No data (data.length == 0).

I think the bug is in the line 8 script while getting the JSON but I dont't have the necessary skill to fix it.

It woud be appreciate your help with issue.

function Scrapeyahoo(symbol) {
  //Leemos de yahoo finance historical data
  const s = encodeURI(symbol); // so that it works with a url
  // turn it into an URL and call it
  const url = 'https://finance.yahoo.com/quote/'  s  '/history?p='  s;
  const res = UrlFetchApp.fetch(url, { muteHttpExceptions: true }).getContentText();
  const $ = Cheerio.load(res);
  const data = $('script').toArray().reduce((ar, x) => {
    const c = $(x).get()[0].children;
    if (c.length > 0) {
      const d = c[0].data.trim().match(/({"context"[\s\S\w] );\n}\(this\)\);/);
      if (d && d.length == 2) {
        ar.push(JSON.parse(d[1]));
      }
    }
    return ar;
  }, []);
  if (data.length == 0) throw new Error("No data.");
  const header = ["date", "open", "high", "low", "close", "adjclose", "volume"];
  
  var key = Object.entries(data[0]).find(([k]) => !["context", "plugins"].includes(k))[1];
  if (!key) return;
  const cdnjs = "https://cdnjs.cloudflare.com/ajax/libs/crypto-js/4.1.1/crypto-js.min.js";
  eval(UrlFetchApp.fetch(cdnjs).getContentText());
  const obj1 = data[0];
  const obj2 = JSON.parse(CryptoJS.enc.Utf8.stringify(CryptoJS.AES.decrypt(obj1.context.dispatcher.stores, key)));
  const ar = obj2.HistoricalPriceStore.prices.map(o => header.map(h => h == "date" ? new Date(o[h] * 1000) : (o[h] || "")));
  // ---

  return ar
}

The original code was modified in December according to this solution, after it stopped working, but I can't find a solution for the issue now.

CodePudding user response:

It seems that the specification for retrieving the key has been changed. In this case, var key = Object.entries(data[0]).find(([k]) => !["context", "plugins"].includes(k))[1]; doesn't return the correct key. By this, an error occurs at CryptoJS.enc.Utf8.stringify(CryptoJS.AES.decrypt(obj1.context.dispatcher.stores, key)).

In the current stage, when I saw this script, the modified script is as follows.

Modified script:

function Scrapeyahoo(symbol) {
  const s = encodeURI(symbol);
  const url = 'https://finance.yahoo.com/quote/'  s  '/history?p='  s;

  var html = UrlFetchApp.fetch(url).getContentText().match(/root.App.main = ([\s\S\w] ?);\n/);
  if (!html || html.length == 1) return;
  var obj = JSON.parse(html[1].trim());
  var key = [...new Map(Object.entries(obj).filter(([k]) => !["context", "plugins"].includes(k)).splice(-4)).values()].join("");
  if (!key) return;
  const cdnjs = "https://cdnjs.cloudflare.com/ajax/libs/crypto-js/4.1.1/crypto-js.min.js";
  eval(UrlFetchApp.fetch(cdnjs).getContentText());
  const obj1 = JSON.parse(CryptoJS.enc.Utf8.stringify(CryptoJS.AES.decrypt(obj.context.dispatcher.stores, key)));
  const header = ["date", "open", "high", "low", "close", "adjclose", "volume"];
  const ar = obj1.HistoricalPriceStore.prices.map(o => header.map(h => h == "date" ? new Date(o[h] * 1000) : (o[h] || "")));
  return ar
}
  • In this case, Cheerio is not used.

Note:

  • In this sample, in order to load crypto-js, eval(UrlFetchApp.fetch(cdnjs).getContentText()) is used. But, if you don’t want to use it, you can also use this script by copying and pasting the script of https://cdnjs.cloudflare.com/ajax/libs/crypto-js/4.1.1/crypto-js.min.js to the script editor of Google Apps Script. By this, the process cost can be reduced.

  • I can confirm that this method can be used for the current situation (January 27, 2023). But, when the specification in the data and HTML is changed in the future update on the server side, this script might not be able to be used. Please be careful about this.

Reference:

  • Related