I want to crawl my SPA built by the Vue framework (Relatively same as React framework). However, I see that the content is not rendered while crawling. The result is:
<!doctype html>
<HTML>
<body>
<div id=app>
</div>
<script type=text/javascript src=/static/js/manifest.2ae2e69a05c33dfc65f8.js></script>
<script type=text/javascript src=/static/js/vendor.60c471696de493d48a1c.js></script>
<script type=text/javascript src=/static/js/app.335a9e9866cb7dc6a517.js></script>
</body>
</html>
Are the component-based javascript frameworks anti crawling? How can I make the component to be rendered by the crawler?
I'm using Abot
framework for crawling propose
CodePudding user response:
All Abot does is send a request to the target website, parse the data, and pass it back to you. As you probably know, frameworks like React or Vue are 100% JavaScript based, meaning no data will be rendered unless you run the JavaScript. So the solution here is to launch a headless browser or another DOM engine and scrape the data.
Several engines you could use are Selenium (browser automation framework available in Python and some other languages), Puppeteer (Chromium-based web-scraper in NodeJS), or a DOM engine like JSDOM.
Moral of the story is: if you want to see result rendered by JavaScript you must execute the JavaScript inside a DOM.