Home > Back-end >  Cheerio(ver 13) doesn't work with US Census Table
Cheerio(ver 13) doesn't work with US Census Table

Time:12-04

Up to recently with the following codes in Javascript(Google Apps Script) I had been able to get data from https://www.census.gov/econ/currentdata/?programCode=VIP&startYear=2022&endYear=2022&categories[]=AXXXX&dataType=T&geoLevel=US&adjusted=1&notAdjusted=0&errorData=0. But all of sudden since sometime last month this codes doesn't work. I couldn't figure out what's wrong. Is there any change in Cheerio library? Can anyone help me? Thank you so much in advance for any help!

function test() {
  var url = "https://www.census.gov/econ/currentdata/?programCode=VIP&startYear=2022&endYear=2022&categories[]=AXXXX&dataType=T&geoLevel=US&adjusted=1&notAdjusted=0&errorData=0#table-results";
  var res = UrlFetchApp.fetch(url, { muteHttpExceptions: true }).getContentText();
  var $ = Cheerio.load(res); //version 13
  var data = $("table").find('td').toArray().map(el => $(el).text().replace(/,/g, ''));
  console.log(data);
}

CodePudding user response:

I believe your goal is as follows.

  • You want to retrieve the bottom table in the site of URL https://www.census.gov/econ/currentdata/?programCode=VIP&startYear=2022&endYear=2022&categories[]=AXXXX&dataType=T&geoLevel=US&adjusted=1&notAdjusted=0&errorData=0 using Google Apps Script.

Issue and workaround:

When I saw the HTML of your URL, the bottom table is not included. It seems that that is created by Javascript. But, unfortunately, I couldn't find the script. But, fortunately, when I saw the site, I can find the URL for downloading the table as CSV data. I thought that this URL might be able to be used. When this is reflected in a sample script, it becomes as follows.

Sample script:

function myFunction() {
  // This is from your URL.
  const url = "https://www.census.gov/econ/currentdata/?programCode=VIP&startYear=2022&endYear=2022&categories[]=AXXXX&dataType=T&geoLevel=US&adjusted=1&notAdjusted=0&errorData=0";

  // Convert your URL.
  const query = url.split("?").pop().split("&").reduce((o, e) => {
    const [k, v] = e.split("=");
    o[k == "programCode" ? "program" : k] = v;
    return o;
  }, {});
  const obj = { format: "csv", adjusted: true, notAdjusted: false, errorData: false, mode: "report", submit: "GET DATA" };
  const q = Object.entries(obj).reduce((o, [k, v]) => (o[k] = v, o), query);
  String.prototype.addQuery = function (obj) { // Ref: https://gist.github.com/tanaikech/70503e0ea6998083fcb05c6d2a857107
    return this   "?"   Object.entries(obj).flatMap(([k, v]) => Array.isArray(v) ? v.map(e => `${k}=${encodeURIComponent(e)}`) : `${k}=${encodeURIComponent(v)}`).join("&");
  }
  const convertedUrl = "https://www.census.gov/econ_export".addQuery(q);

  // Download table as CSV data.
  const res = UrlFetchApp.fetch(convertedUrl);
  const ar = Utilities.parseCsv(res.getContentText());
  const idx = ar.findIndex(([a, b]) => !a && !b);
  const temp = ar.splice(idx   1, ar.length);
  const result = temp[0].map((_, c) => temp.map(r => r[c]));
  console.log(result);
}
  • When this script is run, the following result is obtained.

      [
        ["Period","Jan-2022","Feb-2022","Mar-2022","Apr-2022","May-2022","Jun-2022","Jul-2022","Aug-2022","Sep-2022","Oct-2022","Nov-2022","Dec-2022"],
        ["Value","1726585","1753123","1768168","1780890","1793778","1803791","1817862","1797771","1800105","1794949","NA","NA"]
      ]
    
  • The URL of convertedUrl can be manually retrieved from the site. When you can use the manually retrieved URL, the script is simpler as follows.

      const res = UrlFetchApp.fetch("###URL###");
      const ar = Utilities.parseCsv(res.getContentText());
      const idx = ar.findIndex(([a, b]) => !a && !b);
      const temp = ar.splice(idx   1, ar.length);
      const result = temp[0].map((_, c) => temp.map(r => r[c]));
      console.log(result);
    

IMPORTANT:

  • This sample script is for the current HTML of your URL of https://www.census.gov/econ/currentdata/?programCode=VIP&startYear=2022&endYear=2022&categories[]=AXXXX&dataType=T&geoLevel=US&adjusted=1&notAdjusted=0&errorData=0. When you change your URL, this script might not be able to be used. And, when the specification of the site is changed, this script might not be able to be used. Please be careful about this.

Reference:

  • Related