So i want to extract url for all the buttons on the sidebar, but I can't seem to get past the first one, and I dont know why or how to fix it. Unfortunately, this is for an assignment so I cant import anything else.
This is the code I tried
import requests
from bs4 import BeautifulSoup
import pandas as pd
url = "https://books.toscrape.com/"
genres = ["Travel", "Mystery", "Historical Fiction", "Sequential Art", "Classics", "Philosophy"]
# write your code below
response=requests.get(url, timeout=3)
soup = BeautifulSoup(response.content, 'html.parser')
sidebar=soup.find_all('div',{'class':'side_categories'})
for a in sidebar:
genre_url=a.find('a').get('href')
print(genre_url)
I got
catalogue/category/books_1/index.html
I was expecting
catalogue/category/books_1/index.html
catalogue/category/books/travel_2/index.html
catalogue/category/books/mystery_3/index.html
catalogue/category/books/historical-fiction_4/index.html
catalogue/category/books/sequential-art_5/index.html
catalogue/category/books/classics_6/index.html
...
CodePudding user response:
I used the following CSS selector to find all the tags from the sidebar: .side_categories>ul>li>ul>li>a
import requests
from bs4 import BeautifulSoup
import pandas as pd
url = "https://books.toscrape.com/"
genres = ["Travel", "Mystery", "Historical Fiction", "Sequential Art", "Classics", "Philosophy"]
# write your code below
response=requests.get(url, timeout=3)
soup = BeautifulSoup(response.content, 'html.parser')
genre_url_elems = soup.select(".side_categories>ul>li>ul>li>a")
genre_urls = [e['href'] for e in genre_url_elems]
for url in genre_urls:
print(url)
Here's the output:
catalogue/category/books/travel_2/index.html
catalogue/category/books/mystery_3/index.html
catalogue/category/books/historical-fiction_4/index.html
catalogue/category/books/sequential-art_5/index.html
catalogue/category/books/classics_6/index.html
catalogue/category/books/philosophy_7/index.html
catalogue/category/books/romance_8/index.html
catalogue/category/books/womens-fiction_9/index.html
catalogue/category/books/fiction_10/index.html
catalogue/category/books/childrens_11/index.html
catalogue/category/books/religion_12/index.html
catalogue/category/books/nonfiction_13/index.html
catalogue/category/books/music_14/index.html
catalogue/category/books/default_15/index.html
catalogue/category/books/science-fiction_16/index.html
catalogue/category/books/sports-and-games_17/index.html
catalogue/category/books/add-a-comment_18/index.html
catalogue/category/books/fantasy_19/index.html
catalogue/category/books/new-adult_20/index.html
catalogue/category/books/young-adult_21/index.html
catalogue/category/books/science_22/index.html
catalogue/category/books/poetry_23/index.html
catalogue/category/books/paranormal_24/index.html
catalogue/category/books/art_25/index.html
catalogue/category/books/psychology_26/index.html
catalogue/category/books/autobiography_27/index.html
catalogue/category/books/parenting_28/index.html
catalogue/category/books/adult-fiction_29/index.html
catalogue/category/books/humor_30/index.html
catalogue/category/books/horror_31/index.html
catalogue/category/books/history_32/index.html
catalogue/category/books/food-and-drink_33/index.html
catalogue/category/books/christian-fiction_34/index.html
catalogue/category/books/business_35/index.html
catalogue/category/books/biography_36/index.html
catalogue/category/books/thriller_37/index.html
catalogue/category/books/contemporary_38/index.html
catalogue/category/books/spirituality_39/index.html
catalogue/category/books/academic_40/index.html
catalogue/category/books/self-help_41/index.html
catalogue/category/books/historical_42/index.html
catalogue/category/books/christian_43/index.html
catalogue/category/books/suspense_44/index.html
catalogue/category/books/short-stories_45/index.html
catalogue/category/books/novels_46/index.html
catalogue/category/books/health_47/index.html
catalogue/category/books/politics_48/index.html
catalogue/category/books/cultural_49/index.html
catalogue/category/books/erotica_50/index.html
catalogue/category/books/crime_51/index.html
For more, read about 'CSS selectors': https://developer.mozilla.org/en-US/docs/Web/CSS/CSS_Selectors
CodePudding user response:
Here you go:
import requests
from bs4 import BeautifulSoup
import pandas as pd
url = "https://books.toscrape.com/"
genres = ["Travel", "Mystery", "Historical Fiction", "Sequential Art", "Classics", "Philosophy"]
# write your code below
response=requests.get(url, timeout=3)
soup = BeautifulSoup(response.content, 'html.parser')
# sidebar=soup.find_all('div',{'class':'side_categories'})
sidebar=soup.find_all('a',href=True)
for link in sidebar:
url = link['href']
if 'catalogue' in url:
print(url)