Home > Software engineering >  Trigger-run a Google Sheets script to fetch URL data from Yahoo Finance in batches
Trigger-run a Google Sheets script to fetch URL data from Yahoo Finance in batches

Time:06-01

Prelude - this is quite a long post but mostly because of the many pictures to clarify my issue :)

I have been pulling company data from Yahoo! Finance, first for only a few stocks, but currently for hundred of stocks (and soon to be thousands). I am currently pulling this data live, with one urlFetch per stock ticker each time I load the spreadsheet. This presents three problems:

  • It takes a long time for the sheet to load as it is making hundreds of requests to Yahoo!
  • I regularly get rate limited on Google side (max. 20k url fetches per day)
  • I might get rate-limited on Yahoo! Finance side

I am therefore looking for a better way:

  • Instead of calling Yahoo for each ticker each time the spreadsheet loads, I will run a google script that fetches data for each ticker once per day

Consider the following simplified example sheet (tab db):

enter image description here

The current script reads:

function trigger() {
  const db = SpreadsheetApp.getActiveSpreadsheet().getSheetByName('db');
  const tickers = db.getRange('A2:A'   db.getLastRow()).getValues().flat();

  for (var row = 0; row < tickers.length; row  ) {
    var data = yahoo(tickers[row]);
    db.getRange(row   2, 2, 1, 3).setValues(data);
  }
}

function yahoo(ticker) {
  return [ticker, "SOME", "DATA", "MORE"]
}

Please note that for purposes of testing, yahoo() is simply returning an array. In reality, the yahoo() function pulls JSON data from Yahoo! After running the script, the spreadsheet looks like:

enter image description here

So far so good. But if the list is not 3 but 5000 tickers long, running the script as-is will get me rate-limited quickly (or waiting very long for the spreadsheet to load). Therefore, I thought about the following:

  • Each time the script is run, it will only pull data for 5 tickers. The script will be run once per minute. (5 tickers and per minute for the purposes of this thread)
  • The script will keep track of the last time it downloaded data for each ticker

Assume that the list currently looks like this:

enter image description here

Assume that today is May 31st, and the script is run:

enter image description here

Starting at the top of the lists, the script should want to update 5 tickers that have not yet been updated today:

  • BLK in row 2 has already been updated today, so it is ommitted
  • AAPL was last updated yesterday, so it is the first ticker that gets queried with Yahoo!
  • etc.

Now row 2 to 9 have updated data. The second time the script is run, it should update the next five. Again starting from the top, looking for the first 5 tickers either (1) without a "last run" date or (2) a run date before today:

enter image description here

As you can see, lines 11 to 15 are now updated too. TSLA was skipped, because (for whatever reason), it already was updated today.

Here is the same list again, just with 2 more tickers. If the script is run a few times on june 1st, the result will be like this:

enter image description here

  • 3 batches of 5 tickers
  • one batch of 2 tickers

This works great if the Yahoo! Finance service would always return data for each ticker. However, it will not. For example because:

  • it has no data for a certain ticker
  • the request is timing out

I believe I need a solution to keep track of errors while downloading data. Assume the script is run again a few times (triggered by a once-per-minute trigger in Google script) on june 2nd, and have the following result:

enter image description here

We see two tickers (JPM and ORCL) where data could not be updated. Both are marked in the error column, filled by the script-to-be.

Let's assume we run the script again on june 3rd. On this day, JPM data is downloading perfectly, but ORCL again is generating an error. Yahoo! didn't return any data. The error column is updated to 2.

enter image description here

If a ticker did not get data returned for 2 attempts in a row (error = 2), it should be forever skipped. I will fill it in manually at some point, or look into whether I have typed in a non-existing ticker for instance.

Keeping that of errornous downloads prevents the script from getting stuck. Without it, if there are 5 tickers at the top of the list that keep throwing errors, the script will never go past beyond those 5. It will try to attempt to download data from Yahoo over and over for these tickers.

In this last picture, we see the result of the script being run on june 4th. I have colored again the batches (5 tickers) that were updated per run/ minute.

enter image description here

I tried my best to explain how I am thinking of building a error-proof downloading from Yahoo! Finance. In the rest of my spreadsheet, whenever I need metadata from company, I can simple take it from this db tab instead of querying Yahoo! over and over.

My issue is that my scripting skills are limited. I am not overseeing how to start building this. Could someone please:

  • Give me a start with (pseudo) code OR
  • Give feedback on my thoughts. Am I missing something here that could present a problem?

PS. I understand that I am still making 5 urlfetches per time the script is run. It was suggested to me that I should batch these 5 together (which would prevent me from being rate limited at least on Google's side). This is a great idea, but I found it difficult to understand how it works, so I would rather first have a script that works and that I can follow. In a later stage, I will definitely upgrade / make it more efficient :)

If you've read all the way until here, thank you so much. Any help is greatly appreciated!

[EDIT1]: in reality, yahoo() looks like this:

function yahoo(ticker) {
  const url = 'https://query2.finance.yahoo.com/v10/finance/quoteSummary/'   encodeURI(ticker)   '?modules=price,assetProfile,summaryDetail';
  
  let response = UrlFetchApp.fetch(url, { muteHttpExceptions: true });
  if (response.getResponseCode() == 200) {
      var object = JSON.parse(response.getContentText());
  }

  let fwdPE  = object.quoteSummary.result[0]?.summaryDetail?.forwardPE?.fmt || '-';
  let sector = object.quoteSummary.result[0]?.assetProfile?.sector || '-';
  let mktCap = object.quoteSummary.result[0]?.price?.marketCap?.fmt || '-';

  return [[fwdPE, sector, mktCap]];
}

[EDIT2] enter image description here

CodePudding user response:

I believe your goal is as follows.

  • By retrieving the values from column "A", you want to run the function yahoo, and want to update the columns "B" to "F".
    • By checking the date of the column "E", you want to execute the function of yahoo.
      • From your question and showing images, your values of column "E" is the date object. And, you want to check the year, month and day.
    • In this case, you want to execute the function of yahoo only 5 times every running.
    • When an error occurs from yahoo, you want to count up the column "F". When no error occurs, you want to set null to the column "F".
  • You want to execute the script by the time-driven trigger.
    • You can install the time-driven trigger by yourself.

When I saw your script, the script for achieving the above goal is not included. And, setValues is used in a loop. In this case, the process cost will become high.

So, in your situation, in order to achieve your goal, how about the following sample script?

Sample script:

This script can be directly run with the script editor. So, before you run this script using the time-driven trigger, I would like to recommend testing this script.

After you tested this script and could confirm the output situation, please install the time-driven trigger to the function. By this, when you install the time-driven trigger to this function, the script is run by the trigger.

function trigger() {
  const max = 5; // From your question, maximum execution of "yahoo" is 5.

  const todayObj = new Date();
  const today = Utilities.formatDate(todayObj, Session.getScriptTimeZone(), "yyyyMMdd");
  const db = SpreadsheetApp.getActiveSpreadsheet().getSheetByName('Sheet1');
  const range = db.getRange('A2:F'   db.getLastRow());

  const { values } = range.getValues().reduce((o, r) => {
    const [ticker, b, c, d, e, f] = r;
    if (o.c < max && (e.toString() == "" || Utilities.formatDate(e, Session.getScriptTimeZone(), "yyyyMMdd") != today)) {
      try {
        o.c  ;
        o.values.push([...yahoo(ticker), todayObj, null]);
      } catch (_) {
        o.values.push([ticker, b, c, d, todayObj, ["", "0"].includes(f.toString()) ? 1 : f   1]);
      }
    } else {
      o.values.push(r);
    }
    return o;
  }, { values: [], c: 0 });
  range.setValues(values);
}


function yahoo(ticker) {
  const url = 'https://query2.finance.yahoo.com/v10/finance/quoteSummary/'   encodeURI(ticker)   '?modules=price,assetProfile,summaryDetail';

  let response = UrlFetchApp.fetch(url, { muteHttpExceptions: true });
  if (response.getResponseCode() == 200) {
    var object = JSON.parse(response.getContentText());
  }

  let fwdPE = object.quoteSummary.result[0]?.summaryDetail?.forwardPE?.fmt || '-';
  let sector = object.quoteSummary.result[0]?.assetProfile?.sector || '-';
  let mktCap = object.quoteSummary.result[0]?.price?.marketCap?.fmt || '-';

  return [ticker, fwdPE, sector, mktCap];
}
  • When this script is run, I thought that your above goal might be able to be achieved. So,

  • From your actual yahoo, the returned value is different from your 1st script. So, I also modified it.

Note:

  • Unfortunately, I cannot imagine the actual script of yahoo(ticker). So, in order to check the error, I used try-catch. In this case, it supposes that when the values are not retrieved, an error occurs in yahoo(ticker). Please be careful about this.

  • I cannot understand your actual script of yahoo(ticker). So, please be careful about this.

  • From your question and showing images, I understood that you wanted to check the year, month and day. Please be careful about this.

Reference:

Added:

From your following additional question,

also, I have added to the example sheet a second tab (db2) if I could ask you to have a brief look. Here, I have added 2 columns in between ticker and the rest of the data that yahoo() is returning. Assume that I want to fill in other data here. Would it be possible to adjust your script so that it leaves these columns alone, so only works on columns A and D to H?

I understood that you want to add the empty 2 columns "B" and "C" to the result array. In this case, please test the following sample script.

Sample script:

function trigger() {
  const max = 5; // From your question, maximum execution of "yahoo" is 5.

  const todayObj = new Date();
  const today = Utilities.formatDate(todayObj, Session.getScriptTimeZone(), "yyyyMMdd");
  const db = SpreadsheetApp.getActiveSpreadsheet().getSheetByName('db2');
  const range = db.getRange('A2:H'   db.getLastRow());

  const { values } = range.getValues().reduce((o, r) => {
    const [ticker, b, c, d, e, f, g, h] = r;
    if (o.c < max && (g.toString() == "" || Utilities.formatDate(g, Session.getScriptTimeZone(), "yyyyMMdd") != today)) {
      try {
        o.c  ;
        o.values.push([...yahoo(ticker), todayObj, null]);
      } catch (_) {
        o.values.push([ticker, b, c, d, e, f, todayObj, ["", "0"].includes(f.toString()) ? 1 : h   1]);
      }
    } else {
      o.values.push(r);
    }
    return o;
  }, { values: [], c: 0 });
  range.setValues(values);
}

function yahoo(ticker) {
  const url = 'https://query2.finance.yahoo.com/v10/finance/quoteSummary/'   encodeURI(ticker)   '?modules=price,assetProfile,summaryDetail';

  let response = UrlFetchApp.fetch(url, { muteHttpExceptions: true });
  if (response.getResponseCode() == 200) {
    var object = JSON.parse(response.getContentText());
  }

  let fwdPE = object.quoteSummary.result[0]?.summaryDetail?.forwardPE?.fmt || '-';
  let sector = object.quoteSummary.result[0]?.assetProfile?.sector || '-';
  let mktCap = object.quoteSummary.result[0]?.price?.marketCap?.fmt || '-';

  return [ticker, null, null, fwdPE, sector, mktCap];
}
  • Both trigger and yahoo functions were modified. And, in order to use your 2nd tab of your provided Spreadsheet, the sheet name was also changed to db2. Please be careful about this.
  • Related