I am currently learning web scraping with python. I'm reading Web scraping with Python by Ryan Mitchell.
I am stuck at Crawling Sites Through Search. For example, reuters search given in the book works perfectly but when I try to find it by myself, as I will do in the future, I get this link.
Whilst in the second link it is working for a human, I cannot figure out how to scrape it due to weird class names like this
The first link gives me simple names, like this that I can scrape.
I've encountered the same problem on other sites too. How would I go about scraping it or finding a link with normal names in the future?
Here's my code example:
from bs4 import BeautifulSoup
import requests
from rich.pretty import pprint
text = "hello"
url = f"https://www.reuters.com/site-search/?query={text}"
response = requests.get(url)
soup = BeautifulSoup(response.text, "lxml")
results = soup.select("div.media-story-card__body__3tRWy")
for result in results:
pprint(result)
pprint("###############")
CodePudding user response:
You might resort to a prefix attribute value selector, like
div[class^="media-story-card__body__"]
This assumes that the class is the only one ( or at least notationally the first ). However, the idea can be extended to checking for a substring.
CodePudding user response:
Actually, the required data is loaded via API
as get
method
import requests
api_url='https://www.reuters.com/pf/api/v3/content/fetch/articles-by-search-v2?query={"keyword":"hello","offset":20,"orderby":"display_date:desc","size":20,"website":"reuters"}&d=100&_website=reuters'
data=requests.get(api_url).json()
#print(data)
for handle in data['result']['articles']:
print(handle['title'])
Output:
Elton John says hello again with resumed goodbye tour
Blackstone-backed Candle Media acquires Faraway Road Productions
Frustrated Paire tests positive for COVID-19 again
Barcelona sign Spanish forward Torres from Man City
Private equity will be potent Hollywood antihero
Tumultuous year in bond markets draws to a close
Special Report: Pro-Trump news site targets election workers, inspiring wave of menace
More minority faces in film, TV, music as audiences demand diversity
'Emotionally brilliant': singer Adele releases new album '30'
Hamilton happy to see Mercedes boss's fighting spirit
Oprah Winfrey, Reese Witherspoon, Bumble founder invest in Spanx
Singer Lionel Richie signs deal with Universal Music Publishing
Retailers dream of a glitzy Christmas, even as supply chain snarls loom
Blackstone-backed firm nears $3 billion deal for Moonbug Entertainment - sources
North Korea seeks to boost education with toy-like robots
Adele makes music comeback with new single 'Easy On Me'
Blackstone music deal hits right digital note
Adele says she wrote upcoming album for her son
Adele teases new music with video clip
Goodbye Bond, hello Walk of Fame star for Daniel Craig