Web scraping/crawling for specific URL details within a blog with pagination-CodePudding

I need to achieve a script that scraps URL's from a blog page and identifies if the URL contains certain key words within the link, then print out within a CSV file which blog post URL has the keyword links identified.

As the blog page has pagination and over 35 pages/300 blog posts, I'm unsure how I go about this. The URL's that I'm looking for are within each individual blog post.

So far, I've managed to follow a few tutorials on how to get each blog post URL from the homepage following the pagination's.

CodePudding user response：

It is nearly the same, define your empty list to store results of specialUrls and iterate over your initial result list of urls:

data = []
for url in result:
    r=requests.get(url).text
    soup=BeautifulSoup(r,"lxml") 
    data.append('specialUrl')

To avoid duplicates / not necessary requests iterate over set():

data = []
for url in set(result):
    r=requests.get(url).text
    soup=BeautifulSoup(r,"lxml") 
    data.append('FINDSPECIALURL')

Just in case you can also use break to leave the while loop.

Example

Note This will only scrape the links from first blog page to your results - remove break from end of the while to scrape all the blog pages

from bs4 import BeautifulSoup
import pandas as pd

page=1
result=[]

while True:
    r=requests.get(f"https://www.snapfish.co.uk/blog/page/{page}/").text
    soup=BeautifulSoup(r,"lxml") 
    product=soup.find_all("article",{'class':'post_list'})
    for data in product:
        result.append(data.find('a').get('href'))
    if soup.find("a",class_='next page-numbers') is None:
        break
    page =1
    break#remove break to scrape all the blog pages

data = []

for url in result:
    r=requests.get(url).text
    soup=BeautifulSoup(r,"lxml")
    for a in soup.select('a[href*="design-detail"]'):
        data.append({
            'urlFrom':url,
            'urlTo':a['href']
        })
        
pd.DataFrame(data).drop_duplicates().to_csv('result.csv', index=False)

Output

urlFrom	urlTo
https://www.snapfish.co.uk/blog/what-loving-message-sentiment-to-write-in-your-anniversary-card/	https://www.snapfish.co.uk/design-detail?category=StoreCat_29641&dgId=35d18daa85f844b78c9a7ed0550ca0cf&designId=2b2dbb6233084675828e48e238e2eb9b&sku=CommerceProduct_355343&ptype=cards&pcat=greeting_cards_1989_snapfish_uk&scat=anniversary_cards_10905_snapfish_uk&filters=subCategories~anniversary_cards_10905_snapfish_uk&searchPhrase=&designName=Anniversary Gold Heart&withSku=N&qty=1&dgCatId=anniversary_cards_10905_snapfish_uk&pcatName=Greeting Cards&eoption=CommerceOption_281506#/dgview
https://www.snapfish.co.uk/blog/what-loving-message-sentiment-to-write-in-your-anniversary-card/	https://www.snapfish.co.uk/design-detail?category=StoreCat_29641&dgId=008cec6cdece48c6bf25f13c425f9e4a&designId=acb3720df6a1480ea99dd2f18eec7807&sku=CommerceProduct_355343&ptype=cards&pcat=greeting_cards_1989_snapfish_uk&scat=anniversary_cards_10905_snapfish_uk&filters=subCategories~anniversary_cards_10905_snapfish_uk&searchPhrase=&designName=Heart Wreath Anniversary&withSku=N&qty=1&dgCatId=anniversary_cards_10905_snapfish_uk&pcatName=Greeting Cards&eoption=CommerceOption_281506#/dgview
https://www.snapfish.co.uk/blog/what-loving-message-sentiment-to-write-in-your-anniversary-card/	https://www.snapfish.co.uk/design-detail?category=StoreCat_29641&dgId=b2132bd5de1849479182735dba8857d3&designId=60d4a98f824e48d6badfe4fb443b591f&sku=CommerceProduct_355343&ptype=cards&pcat=greeting_cards_1989_snapfish_uk&scat=anniversary_cards_10905_snapfish_uk&filters=subCategories~anniversary_cards_10905_snapfish_uk&searchPhrase=&designName=XOXO Bold&withSku=N&qty=1&dgCatId=anniversary_cards_10905_snapfish_uk&pcatName=Greeting Cards&eoption=CommerceOption_281506#/dgview
https://www.snapfish.co.uk/blog/what-loving-message-sentiment-to-write-in-your-anniversary-card/	https://www.snapfish.co.uk/design-detail?category=StoreCat_29641&dgId=8261f8e29d8e4178b526ba80012d05f3&designId=c4ac847f6aef4c87a8588ab83d7a7065&sku=CommerceProduct_355343&ptype=cards&pcat=greeting_cards_1989_snapfish_uk&scat=anniversary_cards_10905_snapfish_uk&filters=subCategories~anniversary_cards_10905_snapfish_uk&searchPhrase=&designName=I Found You&withSku=N&qty=1&dgCatId=anniversary_cards_10905_snapfish_uk&pcatName=Greeting Cards&eoption=CommerceOption_281506#/dgview
https://www.snapfish.co.uk/blog/what-to-write-in-a-custom-snapfish-18th-birthday-card/	https://www.snapfish.co.uk/design-detail?category=StoreCat_29641&dgId=2c8420a9f582492c9801dd8a2fb89ba3&designId=765f31622df648fb908b28d73fbf8b40&sku=CommerceProduct_355343&ptype=cards&pcat=birthday_cards_1989_snapfish_uk&scat=for_her_10993_1561482027_snapfish_uk&filters=subCategories~for_friends_10993_1561482050_snapfish_uk\|for_her_10993_1561482027_snapfish_uk&searchPhrase=&designName=Make A Wish&withSku=N&qty=1&dgCatId=for_friends_10993_1561482050_snapfish_uk&pcatName=Birthday Cards&eoption=CommerceOption_281506#/dgview