from bs4 import BeautifulSoup
import requests
url13cases = 'https://hitechfix.com/product-category/cases/apple-cases/iphone-
cases/iphone-13-6-1-cases/'
r = requests.get(url13cases)
soup = BeautifulSoup(r.text, 'html.parser')
img = soup.findAll('img',{"class":"attachment-woocommerce_thumbnail size-
woocommerce_thumbnail"})
So I am trying to scrape all the pictures from my friends website but the problem is there are a few pages. I just want to know how to edit the url where it goes to the second third and fourth page also. Then I also want to create an array or objects for each link.
The link for page 2 is like this https://hitechfix.com/product-category/cases/apple-cases/iphone-cases/iphone-13-6-1-cases/page/2/
Its the same as the last link just the end just the extra /page/2/
at the end. There are also 2 more pages for 4 pages total how do i get all of them and create objects.
CodePudding user response:
You could use built in function range()
to itrate the pages.
In newer code avoid old syntax findAll()
instead use find_all()
or select()
with css selectors
- For more take a minute to check docs
Example
from bs4 import BeautifulSoup
import requests
img_list = []
for i in range(1,5):
r = requests.get(f'https://hitechfix.com/product-category/cases/apple-cases/iphone-cases/iphone-13-6-1-cases/page/{i}')
soup = BeautifulSoup(r.text)
img_list.extend(soup.find_all('img',{"class":"attachment-woocommerce_thumbnail size-woocommerce_thumbnail"}))
img_list