Home > Software design >  Scraping images in nested divs
Scraping images in nested divs

Time:01-06

I am trying to scrape the images from a personal imgur gallery: https://imgur.com/a/FIR1BL1 so I can then format them and prepare them for linking to my website. I want a list of all the image links, but for some reason I can't get any. I also tried with a CSS selector but no luck. I suspect it might be because they are too deeply nested. Also I don't have much experience with scraping.

This is what I came up with using Python and BeautifulSoup:

import requests
from bs4 import BeautifulSoup

# Make a GET request to the website
r = requests.get("https://imgur.com/a/FIR1BL1")

# Parse the HTML content
soup = BeautifulSoup(r.content, 'html.parser')

# Find the element with tag "div" and class "PostContent-imageWrapper-rounded"
div = soup.find_all("div", class_="PostContent-imageWrapper-rounded")

if div:
    # Find all the "img" elements inside the div
    img_tags = div.find_all('img')

    # Print the src attribute of each img element
    for img in img_tags:
        print(img['src'])
else:
    print("Div not found")

CodePudding user response:

You can try to use their API:

import requests

# FIR1BL1 is the album name
url = "https://api.imgur.com/post/v1/albums/FIR1BL1?client_id=546c25a59c58ad7&include=media"

data = requests.get(url).json()

for m in data['media']:
    print(m['url'])

Prints:

https://i.imgur.com/q4UuhEq.jpeg
https://i.imgur.com/WFVRr9Q.jpeg
https://i.imgur.com/QSl0OpM.jpeg
https://i.imgur.com/0yKgw0Y.jpeg
https://i.imgur.com/BV2JfUw.jpeg
https://i.imgur.com/hITF8Y9.jpeg
https://i.imgur.com/HxQDu52.jpeg
https://i.imgur.com/S13WUFn.jpeg
https://i.imgur.com/MDEN7G6.jpeg
https://i.imgur.com/HNuWMOw.jpeg

CodePudding user response:

You are not finding them because they are not there, The images are loaded from the imgur api. to see the request is loading them:

  1. Open a new tab
  2. Open developer tools and go to network tab
  3. open your imgur link in the tab (https://imgur.com/a/FIR1BL1 is the one you have)
  4. use the search to find this request https://api.imgur.com/post/v1/albums/FIR1BL1 or something similar
  5. This request has the data you looking for try to reconstruct something similar and use request.json() to parse it
  • Related