Node.js app with Express, deployed on Heroku. It's just dynamic webpages. Loading static webpages works fine.
Loading dynamic webpages works on localhost, but on Heroku it throws me code=H12
, desc="Request timeout"
, service=30000ms
, status=503
.
In addition, fresh after doing heroku restart
or making a deployment, there always seems to be one instance of a status=200
that loads only the static portion of a dynamic webpage.
Screenshot of logs here.
I've tried the following, which have all led to either the same or other unexpected results when deployed on Heroku (such as Error R14 (Memory quota exceeded)
and code=H13 desc="Connection closed without response"
):
- Switching the Puppeteer Heroku buildpack I was using. I've tried the ones mentioned in this troubleshooting guide and this comment.
- Adding
headless: true
in Puppeteer'slaunch
arguments. - Adding the
--no-sandbox
,--disable-setuid-sandbox
,--single-process
, and--no-zygote
flags inargs
of Puppeteer'slaunch
arguments. (Reference: this comment & this comment) - Setting the
waitUntil
argument in Puppeteer'sgoto
function todomcontentloaded
,networkidle0
andnetworkidle2
. (Reference: this comment) - Passing a
timeout
argument in Puppeteergoto
function; I've tried30000
and60000
specifically, as well as0
per this comment. - Using the
waitForSelector
function. - Clearing Heroku's build cache, as per this article.
- Printing the
url
variable (see my code below) in the console. Output is as expected.
I've observed that:
- With the code I have right now (see below), the
try-catch-finally
block never catches any error. It's always one of the following: I get an incomplete result (static portion of requested dynamic webpage), or the app crashes (code=H13 desc="Connection closed without response"
). So I haven't been able to get anything out of attempting to printexception
in the console from within thecatch
block.
Any ideas on how I could get this to work?
const app = express();
const puppeteer = require("puppeteer");
let port = process.env.PORT || 3000;
let browser;
...
app.listen(port, async() => {
browser = await puppeteer
.launch({
timeout: 0,
headless: true,
args: [
"--no-sandbox",
"--disable-setuid-sandbox",
"--single-process",
"--no-zygote",
],
});
});
...
app.get("/appropriate-route-name", async (req, res) => {
let url = req.query.url;
let page = await browser.newPage();
try {
await page.goto(url, {
waitUntil: "networkidle2",
});
res.send({ data: await page.content() });
} catch (exception) {
res.send({ data: null });
} finally {
await browser.close();
}
}
CodePudding user response:
Was able to get it to work by using user-agents
. Dynamic pages now load just fine on Heroku; requests don't time out every single time anymore.
const app = express();
const puppeteer = require("puppeteer");
let port = process.env.PORT || 3000;
var userAgent = require("user-agents");
...
app.get("/route-name", async (req, res) => {
let url = req.query.url;
let browser = await puppeteer.launch({
args: ["--no-sandbox"],
});
let page = await browser.newPage();
try {
await page.setUserAgent(userAgent.toString()); // added this
await page.goto(url, {
timeout: 30000,
waitUntil: "newtorkidle2", // or "networkidle0", depending on what you need
});
res.send({ data: await page.content() });
} catch (e) {
res.send({ data: null });
} finally {
await browser.close();
}
});