Home > other >  Web scraping/crawling for specific URL details within a blog with pagination
Web scraping/crawling for specific URL details within a blog with pagination

Time:02-17

I need to achieve a script that scraps URL's from a blog page and identifies if the URL contains certain key words within the link, then print out within a CSV file which blog post URL has the keyword links identified.

As the blog page has pagination and over 35 pages/300 blog posts, I'm unsure how I go about this. The URL's that I'm looking for are within each individual blog post.

So far, I've managed to follow a few tutorials on how to get each blog post URL from the homepage following the pagination's.

CodePudding user response:

It is nearly the same, define your empty list to store results of specialUrls and iterate over your initial result list of urls:

data = []
for url in result:
    r=requests.get(url).text
    soup=BeautifulSoup(r,"lxml") 
    data.append('specialUrl')

To avoid duplicates / not necessary requests iterate over set():

data = []
for url in set(result):
    r=requests.get(url).text
    soup=BeautifulSoup(r,"lxml") 
    data.append('FINDSPECIALURL')

Just in case you can also use break to leave the while loop.

Example

Note This will only scrape the links from first blog page to your results - remove break from end of the while to scrape all the blog pages

from bs4 import BeautifulSoup
import pandas as pd

page=1
result=[]

while True:
    r=requests.get(f"https://www.snapfish.co.uk/blog/page/{page}/").text
    soup=BeautifulSoup(r,"lxml") 
    product=soup.find_all("article",{'class':'post_list'})
    for data in product:
        result.append(data.find('a').get('href'))
    if soup.find("a",class_='next page-numbers') is None:
        break
    page =1
    break#remove break to scrape all the blog pages

data = []

for url in result:
    r=requests.get(url).text
    soup=BeautifulSoup(r,"lxml")
    for a in soup.select('a[href*="design-detail"]'):
        data.append({
            'urlFrom':url,
            'urlTo':a['href']
        })
        
pd.DataFrame(data).drop_duplicates().to_csv('result.csv', index=False)

Output

urlFrom urlTo
https://www.snapfish.co.uk/blog/what-loving-message-sentiment-to-write-in-your-anniversary-card/ https://www.snapfish.co.uk/design-detail?category=StoreCat_29641&dgId=35d18daa85f844b78c9a7ed0550ca0cf&designId=2b2dbb6233084675828e48e238e2eb9b&sku=CommerceProduct_355343&ptype=cards&pcat=greeting_cards_1989_snapfish_uk&scat=anniversary_cards_10905_snapfish_uk&filters=subCategories~anniversary_cards_10905_snapfish_uk&searchPhrase=&designName=Anniversary Gold Heart&withSku=N&qty=1&dgCatId=anniversary_cards_10905_snapfish_uk&pcatName=Greeting Cards&eoption=CommerceOption_281506#/dgview
https://www.snapfish.co.uk/blog/what-loving-message-sentiment-to-write-in-your-anniversary-card/ https://www.snapfish.co.uk/design-detail?category=StoreCat_29641&dgId=008cec6cdece48c6bf25f13c425f9e4a&designId=acb3720df6a1480ea99dd2f18eec7807&sku=CommerceProduct_355343&ptype=cards&pcat=greeting_cards_1989_snapfish_uk&scat=anniversary_cards_10905_snapfish_uk&filters=subCategories~anniversary_cards_10905_snapfish_uk&searchPhrase=&designName=Heart Wreath Anniversary&withSku=N&qty=1&dgCatId=anniversary_cards_10905_snapfish_uk&pcatName=Greeting Cards&eoption=CommerceOption_281506#/dgview
https://www.snapfish.co.uk/blog/what-loving-message-sentiment-to-write-in-your-anniversary-card/ https://www.snapfish.co.uk/design-detail?category=StoreCat_29641&dgId=b2132bd5de1849479182735dba8857d3&designId=60d4a98f824e48d6badfe4fb443b591f&sku=CommerceProduct_355343&ptype=cards&pcat=greeting_cards_1989_snapfish_uk&scat=anniversary_cards_10905_snapfish_uk&filters=subCategories~anniversary_cards_10905_snapfish_uk&searchPhrase=&designName=XOXO Bold&withSku=N&qty=1&dgCatId=anniversary_cards_10905_snapfish_uk&pcatName=Greeting Cards&eoption=CommerceOption_281506#/dgview
https://www.snapfish.co.uk/blog/what-loving-message-sentiment-to-write-in-your-anniversary-card/ https://www.snapfish.co.uk/design-detail?category=StoreCat_29641&dgId=8261f8e29d8e4178b526ba80012d05f3&designId=c4ac847f6aef4c87a8588ab83d7a7065&sku=CommerceProduct_355343&ptype=cards&pcat=greeting_cards_1989_snapfish_uk&scat=anniversary_cards_10905_snapfish_uk&filters=subCategories~anniversary_cards_10905_snapfish_uk&searchPhrase=&designName=I Found You&withSku=N&qty=1&dgCatId=anniversary_cards_10905_snapfish_uk&pcatName=Greeting Cards&eoption=CommerceOption_281506#/dgview
https://www.snapfish.co.uk/blog/what-to-write-in-a-custom-snapfish-18th-birthday-card/ https://www.snapfish.co.uk/design-detail?category=StoreCat_29641&dgId=2c8420a9f582492c9801dd8a2fb89ba3&designId=765f31622df648fb908b28d73fbf8b40&sku=CommerceProduct_355343&ptype=cards&pcat=birthday_cards_1989_snapfish_uk&scat=for_her_10993_1561482027_snapfish_uk&filters=subCategories~for_friends_10993_1561482050_snapfish_uk|for_her_10993_1561482027_snapfish_uk&searchPhrase=&designName=Make A Wish&withSku=N&qty=1&dgCatId=for_friends_10993_1561482050_snapfish_uk&pcatName=Birthday Cards&eoption=CommerceOption_281506#/dgview
  • Related