Home > database >  How to fetch data which is loaded by the ajax (asynchronous) method after the web page has already b
How to fetch data which is loaded by the ajax (asynchronous) method after the web page has already b

Time:12-08

I have a sheet, where I was trying to fetch the table contents from the site: https://listentotaxman.com

Using the following function:

=IMPORTHTML("https://listentotaxman.com/?year=2021&ingr=35000","table",2)

Desired Output:

enter image description here

But; no matter what I did, it was not pulling the contents of the table; instead, it just fetched the default table headers and values; not the actual data based on the entered salary which is 35000 in our example. Reason being that the table data on the site is being loaded after the page has already been loaded. Is there a way to accomplish this web scraping using an apps script function? Please help, thank you.

CodePudding user response:

I believe your goal is as follows.

  • You want to retrieve the table shown in your question.
  • You want to achieve this using Google Apps Script.

When I tested your URL, unfortunately, it seems that the table cannot be directly retrieved from the URL. So as a workaround, how about retrieving the data from the server and creating a table by parsing the data? When this is reflected in a Google Apps Script, it becomes as follows.

Sample script:

Please copy and paste the following script to the script editor and save the script. And, please put a custom formula of =SAMPLE(35000, 2021) to a cell. By this, the values are retrieved and put to Spreadsheet.

function SAMPLE(grosswage, year) {
  // 1. Retrieve values.
  const url = "https://listentotaxman.com/ws/tax/index.js.php";
  const payload = { "response": "json", "year": year, "region": "uk", "married": false, "blind": false, "exni": false, "marriageallowance": false, "partnerwage": 0, "plan": "", "age": "0", "extra": 0, "taxcode": "", "pension": "", "time": "1", "grosswage": grosswage.toString(), "ingr": "35000" };
  const params = { method: "post", payload: JSON.stringify(payload), contentType: "application/json" };
  const res = UrlFetchApp.fetch(url, params);

  // 2. Parse values and create a table.
  const obj = JSON.parse(res.getContentText());
  const headers = ["gross_pay", "tax_free_allowance", "taxable_pay", "tax_paid", "tax_due", "student_loan_repayment", "national_insurance", "total_deductions", "net_pay", "employers_ni", "net_change_from_previous"];
  const base = headers.flatMap(h => {
    const temp = [];
    const v = obj[h];
    if (h == "tax_due") {
      Array(3).fill("").forEach((_, i) => temp.push(v[i].amount))
    } else if (h == "total_deductions") {
      temp.push(obj["tax_paid"]   obj["national_insurance"]);
    } else if (h == "net_change_from_previous") {
      temp.push(obj["net_pay"] - obj["previous"]["net_pay"]);
    } else {
      temp.push(v);
    }
    return temp;
  });

  // 3. Return the parsed values as 2 dimensional array.
  const colHead = ["Pay Summary", "%", "Yearly", "Monthly", "Week"];
  const rowHead = ["Gross Pay", "Tax free allowance", "Total taxable", " Total Tax due", "20% rate", "40% rate", "45% rate", "Student Loan", "National Insurance", "Total Deductions", "Net Wage", "Employers NI", "Net change from 2020"];
  return [colHead, ...base.map((e, i, a) => [(e / grosswage) * 100, e, e / 12, e / 52].map((f, j) => i == a.length - 1 && j != 0 ? -Math.floor(f) : Math.floor(f))).map((e, i) => [rowHead[i], ...e])];
}

Result:

When this script is used, the following result is obtained. About the number format for each column, please set them.

enter image description here

Note:

  • This is a simple sample script. So please modify this for your actual situation.
  • I tested only =SAMPLE(35000, 2021). So when other parameters are used, an error might occur. This sample script is for just your this question. So please be careful about this. And also, when the site cannot be used, the values cannot be retrieved. Also, please be careful this.
  • Related