Home > Net >  BeautifulSoup | not able to iterate through div [ js-content-images ] class tag
BeautifulSoup | not able to iterate through div [ js-content-images ] class tag

Time:04-09

refer the below image for reference i want to scrap the Name of Diseases,URLs associated with diseases and Icon images of diseases. not able to iterate through div [ js-content-images ] tag !

import requests
from bs4 import BeautifulSoup

URL = "https://dermnetnz.org/image-library"
page = requests.get(URL)

soup = BeautifulSoup(page.content, "html.parser")

job_elements = soup.find("div", class_="flex [ js-sticky-container ]")

job2 = job_elements.find_all("div", class_="imageList__group")

for job_element in job2:
    print(job_element)

CodePudding user response:

You don't need bs4 or selenium to scrape this page. If you go to network tab you will get json url you need to send the request and capture json response.

enter image description here

enter image description here

https://dermnetnz.org/image-library/imagesJson

code :

import requests

res=requests.get("https://dermnetnz.org/image-library/imagesJson")
result=res.json()
for r in result:
    print("Diseases Name : "   r['name'])
    print("Image : "   r['thumbnail'])
    print("Url : "   "https://dermnetnz.org"   r['url'])

Output:

Diseases Name : Roseola images
Image : https://dermnetnz.org/assets/Uploads/roseola-001__FocusFillWzE1MCwxMTAsInkiLDFd.jpg
Url : https://dermnetnz.org/topics/roseola-images/?stage=Live
Diseases Name : Dermatomyositis images
Image : https://dermnetnz.org/assets/Uploads/dermatomyositis-eyelids-4__FocusFillWzE1MCwxMTAsIngiLDhd.jpg
Url : https://dermnetnz.org/topics/dermatomyositis-images/?stage=Live
Diseases Name : Solar keratosis affecting the face images
Image : https://dermnetnz.org/assets/Uploads/248__FocusFillWzE1MCwxMTAsInkiLDFd.jpg
Url : https://dermnetnz.org/topics/actinic-keratosis-face-images/?stage=Live
Diseases Name : Actinic keratosis affecting the face images
Image : https://dermnetnz.org/assets/Uploads/248__FocusFillWzE1MCwxMTAsInkiLDFd.jpg
Url : https://dermnetnz.org/topics/actinic-keratosis-face-images/?stage=Live
Diseases Name : Solar keratosis affecting the hand images
Image : https://dermnetnz.org/assets/Uploads/393__FocusFillWzE1MCwxMTAsInkiLDFd.jpg
Url : https://dermnetnz.org/topics/actinic-keratosis-affecting-the-hand-images/?stage=Live
Diseases Name : Solar keratosis affecting the legs and feet images
Image : https://dermnetnz.org/assets/Uploads/478__FocusFillWzE1MCwxMTAsInkiLDFd.jpg
Url : https://dermnetnz.org/topics/actinic-keratosis-leg-and-foot-images/?stage=Live
Diseases Name : Solar keratosis affecting the scalp images
Image : https://dermnetnz.org/assets/Uploads/418__FocusFillWzE1MCwxMTAsInkiLDFd.jpg
Url : https://dermnetnz.org/topics/actinic-keratosis-scalp-images/?stage=Live
Diseases Name : Solar keratosis on the nose images
Image : https://dermnetnz.org/assets/Uploads/sks-nose3-s__FocusFillWzE1MCwxMTAsInkiLDFd.jpg
Url : https://dermnetnz.org/topics/actinic-keratosis-on-the-nose-images/?stage=Live
Diseases Name : Solar keratosis treated with imiquimod images
Image : https://dermnetnz.org/assets/Uploads/3723__FocusFillWzE1MCwxMTAsInkiLDFd.jpg
Url : https://dermnetnz.org/topics/actinic-keratosis-imiquimod-images/?stage=Live
Diseases Name : Autoimmune alopecia images
Image : https://dermnetnz.org/assets/Uploads/1323__FocusFillWzE1MCwxMTAsInkiLDIzXQ.jpg
Url : https://dermnetnz.org/topics/alopecia-areata-images/?stage=Live
Diseases Name : Hypomelanotic malignant melanoma images
Image : https://dermnetnz.org/assets/Uploads/12a-amelanotic-melanoma__FocusFillWzE1MCwxMTAsInkiLDFd.jpg
Url : https://dermnetnz.org/topics/amelanotic-melanoma-images/?stage=Live
Diseases Name : Epiloia images
Image : https://dermnetnz.org/assets/Uploads/angiofibromas-19-s__FocusFillWzE1MCwxMTAsInkiLDFd.jpg
Url : https://dermnetnz.org/topics/tuberous-sclerosis-images/?stage=Live
Diseases Name : Perleche images
Image : https://dermnetnz.org/assets/Uploads/perleche13-s__FocusFillWzE1MCwxMTAsInkiLDFd.jpg
Url : https://dermnetnz.org/topics/angular-cheilitis-images/?stage=Live
Diseases Name : Besnier prurigo images
Image : https://dermnetnz.org/assets/Uploads/atopic26-s__FocusFillWzE1MCwxMTAsInkiLDFd.jpg
Url : https://dermnetnz.org/topics/atopic-dermatitis-images/?stage=Live
Diseases Name : Atopic eczema images
Image : https://dermnetnz.org/assets/Uploads/atopic26-s__FocusFillWzE1MCwxMTAsInkiLDFd.jpg
Url : https://dermnetnz.org/topics/atopic-dermatitis-images/?stage=Live
Diseases Name : Atypical melanocytic naevus
Image : https://dermnetnz.org/assets/Uploads/604__FocusFillWzE1MCwxMTAsInkiLDFd.jpg
Url : https://dermnetnz.org/topics/atypical-naevus-images/?stage=Live
Diseases Name : Bacteria images
Image : https://dermnetnz.org/assets/Uploads/syph6-s-2__FocusFillWzE1MCwxMTAsInkiLDFd.jpg
Url : https://dermnetnz.org/image-catalogue/bacterial-skin-infection-images/?stage=Live

...so on

CodePudding user response:

The reason you can't find it has to do with those elements being loaded through javascript. It's a dynamic website. You can see this by blocking javascript execution, and the result will be a lack of images.

You have two options: you can either try to work your way through the javascript to reverse engineer it, or you could render the javascript with a browser rendering engine.

There is Selenium, with available Python bindings through pip install selenium. Click this link for installation instructions for your system, as you'll also need to install the driver, like Geckodriver or ChromeDriver.

Then, you may have to slightly alter the following code for it to work for you... but the following code finds the first element that you desire, and it's as easy as:

# setting up
from selenium import webdriver
chrome_options = webdriver.ChromeOptions()
chrome_options.add_argument('--headless')
chrome_options.add_argument('--no-sandbox')
chrome_options.add_argument('--disable-dev-shm-usage')
driver = webdriver.Chrome('chromedriver',chrome_options=chrome_options)

# your own application
driver.get('https://dermnetnz.org/image-library')
element = driver.find_element_by_class_name('imageList__group__item')
img_element = element.find_element_by_tag_name('img')

# here is the link:
print(element.get_attribute('href'))
# here is the text:
print(element.text)
# here is the img source:
print(img_element.get_attribute('src'))

Want to find multiple of those? Then it's as simple as using elements = driver.find_elements_by_class_name('imageList__group__item') instead of element = driver.find_element_by_class_name('imageList__group__item') and looping through them, finding the img_element for each of those.

  • Related