Example:
website has url https://images.com/Robots.aspx?ID=xxxx , where xxxx is an integer between 1 and 1935.
On each given page there can be an <img src="Images\Robots\{robotname}.png">
.
Not all pages have this element.
I need to extract all existing {robotname} variants and then download the images, but i'm struggling to understand how i can store the element in an object (Python or JS, for example).
How do i start / what i can read to do it?
CodePudding user response:
In Python you can use BeautifulSoup and extract all img tags soup.find_all("img")
and manipulate the data from there
CodePudding user response:
- Download each page in a loop with AJAX.
- Parse the DOM with something like jsdom.
- Use a selector with [
querySelectorAll()
].(https://developer.mozilla.org/en-US/docs/Web/API/Document/querySelectorAll) to get each image element. - Use a regular expression on the image src-attribute to get the robot name. Like:
$img.src.match(/([^\/] ).png$/i)[1]
. - Download all the robots with AJAX.
- Combine robot name and downloaded robot to an object with key value pairs.
Let me know if you need more help or a code example.