I'm trying to get a specific element from the DOM of a webpage using nodeJs. To do so, I used jsdom and everything seems to be perfect with regular webpages and I am able to see the DOM in nodejs and get and select elements.
The problem is on some webpages like this one, when you go to the page there is an initial webpage and then after fetching new data, the page changes and gets updated. This is when my desired DOM element appears. My code shows the initial DOM structure of the website so I am not able to get that specific dom, since it will be added after like 5 seconds to the webpage.
How can I wait for the website to be fully rendered and get updated and then get its DOM?
Here is my code:
const jsdom = require('jsdom');
const { JSDOM } = jsdom;
const url =
'https://www.flytoday.ir/flight/search?departure=THR,1&arrival=MUC,1&departureDate=2022-09-25&adt=1&chd=0&inf=0&cabin=1';
JSDOM.fromURL(url).then((dom) => {
console.log(
dom.window.document
.querySelectorAll('*')
.forEach((e) => console.log(e.innerHTML))
);
});
CodePudding user response:
You need a headless browser. There are simple scraping tools like jsdom or cheerio in NodeJS, but they're not able to run JS after the dom loads. Modern browsers can, and that's why you're not able to do what you want. Back when I wanted to convert fully-rendered AngularJS pages into PDFs, we had to use PhantomJS.
Nowadays, that project has fallen out of favor since we have Headless Chrome. But now, it appears that someone has written an even-more-simplified wrapper around it, called Puppeteer (credit: GrafiCode).
The other option, which has been around for a long time and is very powerful, is Selenium. This one is good if you want to know how your UI code runs/looks in different browsers. That's probably a bit much for what you're trying to do, but I figured I'd give you the info I had on the topic.