Home > Software engineering >  Scraping table from website into Google Sheets
Scraping table from website into Google Sheets

Time:12-10

I am trying to scrape the table on the right of this website: https://www.tennisabstract.com/current/2022WTAAngers125.html

I have tried to use the XML path:

"//*[@id='forecast']"

But it returns an error, is there something I am missing?

When I try to scrape other items from the site, i.e. using:

"//*[@id='pmF']"

I have no issues, and the formula returns the expected value of 'F'.

CodePudding user response:

When I saw the HTML data of your URL, it seems that the table is created by Javascript. In this case, unfortunately, I thought that IMPORTXML cannot be used. But, I think that Google Apps Script might be able to be used for this situation. So, in this answer, I would like to propose achieving your goal using Google Apps Script.

From your following reply,

that is the expected URL. As long as Google Apps Script is free and can export the data straight into a Google Sheet, than happy to use it.

I understood that using Google Apps Script is included in your expected goal. And, your URL is https://www.tennisabstract.com/current/2022WTAAngers125.html.

In this case, how about the following sample script?

Sample script:

Please copy and paste the following script to the script editor of Google Spreadsheet and please set your sheet name you want to put the values to Sheet1 of const sheetName = "Sheet1";. And save the script.

And, before you use this script, please enable Sheets API at Advanced Google services.

And, please run the function myFunction with the script editor. By this, a dialog for authorizing the scopes is opened. Please permit it. By this, the script is run.

function myFunction() {
  const url = "https://www.tennisabstract.com/current/2022WTAAngers125.html"; // This is from your question.
  const sheetName = "Sheet1"; // Please set your sheet name.

  const res = UrlFetchApp.fetch(url);
  const r = res.getContentText().match(/var projCurrent = '([\w\s\S]*?)'/);
  if (r) {
    const ss = SpreadsheetApp.getActiveSpreadsheet();
    const sheet = ss.getSheetByName(sheetName); // Please set your sheet name.
    const requests = [{ pasteData: { html: true, data: r[1].trim(), coordinate: { sheetId: sheet.getSheetId() } } }];
    Sheets.Spreadsheets.batchUpdate({ requests }, ss.getId());
  }
}
  • When this script is run, the values are retrieved from the URL of https://www.tennisabstract.com/current/2022WTAAngers125.html and put into the sheet "Sheet1".

Note:

  • This sample script is for your provided URL of https://www.tennisabstract.com/current/2022WTAAngers125.html. If you change the URL and the structure of the HTML data is changed, this script might not be able to be used. Please be careful about this.

References:

  • Related