I was writing this code:
import requests
from bs4 import BeautifulSoup as bs
url = "https://keithgalli.github.io/web-scraping/webpage.html"
r = requests.get(url "webpage.html")
webpage = bs(r.content)
images = webpage.select("div.row div.column img")
image_url = images[0]['src']
full_url = url image_url
img_data = requests.get(full_url).content
with open('late_combo.jpg', 'wb') as handler:
handler.write(img_data)
but got this error: IndexError: list index out of range
CodePudding user response:
I think you made an error in the url, you already initialized it to url = "https://keithgalli.github.io/web-scraping/webpage.html"
, and then you are adding "webpage.html"
, and the page url = "https://keithgalli.github.io/web-scraping/webpage.html" "webpage.html"
doesn't exist
Do this instead:
r = requests.get(url)
And it should work
CodePudding user response:
import requests
from bs4 import BeautifulSoup as bs
url = "https://keithgalli.github.io/web-scraping/webpage.html"
r = requests.get(url)
webpage = bs(r.content,'html.parser')
images = webpage.select("div.row div.column img")
image_url = images[0]['src']
full_url = 'https://keithgalli.github.io/web-scraping' image_url
print(full_url)
Output:
https://keithgalli.github.io/web-scrapingimages/italy/lake_como.jpg