i want to download all the pictures from this side in high resolution and not the preview pictures:
https://www.booklooker.de/Bücher/Donna-W-Cross Die-Päpstin/id/A02A8f9001ZZl
The link -> https://xxxxx.de to the images i want to download is stored in this part of the html-code: link to the picture
The Code i tried so far was that:
from bs4 import BeautifulSoup
import requests
page = requests.get("https://www.booklooker.de/Bücher/Donna-W-Cross Die-Päpstin/id/A02A8f9001ZZl")
souped = BeautifulSoup(page.content, "html.parser")
for pic in souped.find_all(class_="preview hasXXL"):
print(pic['href'])
With that i get to the right part of the code. But i don't get it how to scrape the link after the href-tag. When i want to scarpe it i get that results:
/app/detail.php?id=A02A8f9001ZZl&picNo=1" id="preview_1
But i expect that:
https://images.booklooker.de/x/02Sh07/Donna-W-Cross Die-Päpstin.jpg
What did i do wrong?
Thanks a lot for your help!!
CodePudding user response:
Since you're trying to download images, you may search for the <img>
tag and utilise it's src
attribute which provides the accurate information.
Your Modified Code:
from bs4 import BeautifulSoup
import requests
page = requests.get("https://www.booklooker.de/Bücher/Donna-W-Cross Die-Päpstin/id/A02A8f9001ZZl")
souped = BeautifulSoup(page.content, "html.parser")
for pic in souped.find_all("img", class_="previewImage"):
print(pic["src"])
Output:
https://images.booklooker.de/t/02Sh07/Donna-W-Cross Die-Päpstin.jpg
https://images.booklooker.de/t/02Sh08/Donna-W-Cross Die-Päpstin.jpg
...
https://images.booklooker.de/t/02Sh0S/Donna-W-Cross Die-Päpstin.jpg
CodePudding user response:
If you want the image URLs (e.g. https://images.booklooker.de/t/02Sh07/Donna-W-Cross Die-Päpstin.jpg
) then you'd need to follow the previewImage elements in the HTML (not the "preview hasXXL" class) and extract the "src" attribute from the img element for the URL.
from bs4 import BeautifulSoup
import requests
url = "https://www.booklooker.de/Bücher/Donna-W-Cross Die-Päpstin/id/A02A8f9001ZZl"
page = requests.get(url)
souped = BeautifulSoup(page.content, "html.parser")
for pic in souped.find_all(class_="previewImage"):
# resolve any relative urls to absolute urls using base URL
src = requests.compat.urljoin(url, pic['src'])
print(src)
Output:
https://images.booklooker.de/t/02Sh07/Donna-W-Cross Die-Päpstin.jpg
...
https://images.booklooker.de/t/02Sh0S/Donna-W-Cross Die-Päpstin.jpg