Home > Enterprise >  How to crawl component-based web applications built by Vue and React?
How to crawl component-based web applications built by Vue and React?

Time:03-08

I want to crawl my SPA built by the Vue framework (Relatively same as React framework). However, I see that the content is not rendered while crawling. The result is:

 <!doctype html>
 <HTML>
  <body>
   <div id=app>
    </div>
     <script type=text/javascript src=/static/js/manifest.2ae2e69a05c33dfc65f8.js></script> 
     <script type=text/javascript src=/static/js/vendor.60c471696de493d48a1c.js></script>
     <script type=text/javascript src=/static/js/app.335a9e9866cb7dc6a517.js></script>
  </body>
 </html>

Are the component-based javascript frameworks anti crawling? How can I make the component to be rendered by the crawler?

I'm using Abot framework for crawling propose

CodePudding user response:

All Abot does is send a request to the target website, parse the data, and pass it back to you. As you probably know, frameworks like React or Vue are 100% JavaScript based, meaning no data will be rendered unless you run the JavaScript. So the solution here is to launch a headless browser or another DOM engine and scrape the data.

Several engines you could use are Selenium (browser automation framework available in Python and some other languages), Puppeteer (Chromium-based web-scraper in NodeJS), or a DOM engine like JSDOM.

Moral of the story is: if you want to see result rendered by JavaScript you must execute the JavaScript inside a DOM.

  • Related