Home > Net >  google sheets, apps script parsehtml error
google sheets, apps script parsehtml error

Time:01-31

this is the complete code,

function extractData() {
  var url = "https://www.theopenalliance.com/teams/2023/";
  var html = UrlFetchApp.fetch(url).getContentText();
  var data = parseHtml(html);
  var sheet = SpreadsheetApp.getActiveSheet();
  sheet.clearContents();
  if (data.length > 0) {
    sheet.getRange(1, 1, data.length, data[0].length).setValues(data);
    for (var i = 0; i < data.length; i  ) {
      for (var j = 0; j < data[i].length; j  ) {
        if (data[i][j].indexOf("http") === 0) {
          var button = sheet.getRange(i   1, j   1).attachButton({
            text: "Link",
            url: data[i][j]
          });
        }
      }
    }
  }
}

function parseHtml(html) {
  var startIndex = html.indexOf("<tbody>");
  var endIndex = html.indexOf("</tbody>");
  var table = html.substring(startIndex, endIndex);
  var rows = table.split("<tr>");
  var data = [];
  for (var i = 1; i < rows.length; i  ) {
    var cells = rows[i].split("<td");
    var row = [];
    for (var j = 1; j < cells.length; j  ) {
      var cell = cells[j];
      var linkStartIndex = cell.indexOf("href=");
      if (linkStartIndex !== -1) {
        var linkEndIndex = cell.indexOf("class");
        var link = cell.substring(linkStartIndex   6, linkEndIndex - 2);
        row.push(link);
      } else {
        row.push(cell.substring(cell.indexOf(">")   1, cell.indexOf("</td>")));
      }
    }
    data.push(row);
  }
  return data;
} 

however function parseHtml(html) gives an error with this line

var startIndex = html.indexOf("<tbody>");

Anyone has any suggestions? i'm trying to copy and paste tables from the link to a google sheets.

i expected to see every teams numbers and other values (Public links, location etc) in google sheets but nothing shows up. Also i was expecting to see buttons that had links attached to them if the buttons exists, such as github, photos etc. Please check the link and im sure you will have a better idea of im trying to tell. Also please help me fix the code, if possible, copy and edit the code than repost it, i would greatly appreciate it

CodePudding user response:

In your situation, how about using Sheets API? Because I thought that the HTML parser of Sheets API is useful for your situation. When Sheets API is used for your URL, how about the following sample script?

Sample script:

Before you use this script, please enable Sheets API at Advanced Google services.

function myFunction() {
  const url = "https://www.theopenalliance.com/teams/2023/"; // This is from your script.

  const html = UrlFetchApp.fetch(url).getContentText();
  const table = html.match(/<table[\s\S\w] ?<\/table>/);
  if (!table) {
    throw new Error("Table was not found.");
  }
  const ss = SpreadsheetApp.getActiveSpreadsheet();
  const sheet = ss.getActiveSheet().clearContents();
  SpreadsheetApp.flush();
  const requests = { requests: [{ pasteData: { html: true, data: table[0], coordinate: { sheetId: sheet.getSheetId() } } }] };
  Sheets.Spreadsheets.batchUpdate(requests, ss.getId());
}
  • When this script is run, a table is retrieved from the URL and put it into the active sheet.

References:

  • Related