I'm currently working on a project that scrapes grocery store pages for data given a search query (i.e., cereal) and display that in a Spinner view. However, I'm having some difficulty finding a way to scrape the data off the pages. I tried using Jsoup as that was the concensus online, but that doesn't support JavaScript.
The issue lies that most, if not all, sites like these use DOM storage for up-to-date stock listings and prices. That's why libraries like Jsoup won't work as they will return the HTML for no JavaScript. I currently have a prototype that displays the page via a WebView but I see no way of getting the data.
I've tried to research how to get around this but it's quite confusing to be quite honest to find an elegent solution, if that even exists.
If anyone can help, or at the very least point me in the right direction, that would be most appreciated! Thanks ^_^
CodePudding user response:
Selenium would be a good option for web scraping. https://www.selenium.dev/ It basically has access to the website's DOM. In past experience, a dynamically generated web page can be difficult to scrape. RegExp will be your friend. https://regexone.com/