Home > Software design >  How to scrape text outside of href with BeautifulSoup
How to scrape text outside of href with BeautifulSoup

Time:05-17

I'm trying to scrape the text "Woodford Reserve Master Collection Five Malt Stouted Mash" from the following:

<a aria-hidden="true" tabindex="-1" id="WC_CatalogEntryDBThumbnailDisplayJSPF_3074457345616901168_link_9b" href="/webapp/wcs/stores/servlet/ProductDisplay?catalogId=10051&amp;storeId=10051&amp;productId=3074457345616901168&amp;langId=-1&amp;partNumber=000086630prod&amp;errorViewName=ProductDisplayErrorView&amp;categoryId=1334014&amp;top_category=25208&amp;parent_category_rn=1334013&amp;urlLangId=&amp;variety=American Whiskey&amp;categoryType=Spirits&amp;fromURL=/webapp/wcs/stores/servlet/CatalogSearchResultView?storeId=10051&catalogId=10051&langId=-1&categoryId=1334014&variety=American+Whiskey&categoryType=Spirits&top_category=&parent_category_rn=&sortBy=5&searchSource=E&pageView=&beginIndex=">Woodford Reserve Master Collection Five Malt Stouted Mash</a>

I am able to scrape the href using the following code, however can't seem to be able to scrape the title text separately:


for product in soup.select('a.catalog_item_name'):
    link.append(product['href'])

print(link)

I have also tried

for product in soup.select('a.catalog_item_name'):
    link.append(product.a['href'])

print(link)

However I can't seem to quite capture the title information separately. Thanks in advance for the help!

CodePudding user response:

Try:

data=[]
for product in soup.select('a.catalog_item_name'):
    link=product['href']
    title=product.get_text()
    data.append([link,title])

print(data)
  • Related