I'm trying to webscrape the news from the following URL:
CodePudding user response:
As one of the comments mentioned, you can use
as you can see above, there is no div id="search"
element; in such cases, the commented out selector might work.
Sample usage:
# selectors for headerless request in comments
blockSel = '#search div[eid] div[data-hveid][data-ved] > div[data-hveid]'
# blockSel = '#main > div > div > a'
innerSels = {
'heading': 'div[role="heading"]', # 'h3',
'link': (None, 'href'), # (None, 'href'),
'snippet': 'div[role="heading"] div', # '"parent"a > div div',
'date': 'div[role="heading"] ~ span div', # 'div div div > span',
'site_name': 'g-img span' # 'h3 div'
}
articles = []
srSectn = soup.select(blockSel)
srsLen = len(srSectn)
for i, s in enumerate(srSectn):
if s.select_one('a[jsname]'): s = s.select_one('a[jsname]')
print('', end=f'\radding article {i 1} of {srsLen}...')
aData = {}
for k in innerSels:
sel = innerSels[k]
target = '"text"'
if type(sel) in [tuple, list] and len(sel) > 1:
target = None if sel[1] is None else str(sel[1])
sel = None if sel[0] is None else str(sel[0])
if type(sel) == str and sel.startswith('"parent"'):
sel = s.parent.select_one(sel.replace('"parent"', '', 1))
else:
sel = s if sel is None else s.select_one(sel)
if sel is None:
aData[k] = None
continue
if target is None:
aData[k] = str(sel)
elif target == '"text"':
aData[k] = sel.get_text(strip=True)
else: aData[k] = sel.get(target)
articles.append(aData)
print(f'\radded {len(articles)} articles from {srsLen} sections')
CodePudding user response:
Just in addition to simplify the selection a bit - You could use all <a>
that has a <h3>
as container for your iterations:
soup.select('a:has(h3)')
Example
Uses cookies={'CONSENT':'YES '}
cause it is necessary from my location to set, but could be ignored from yours.
from bs4 import BeautifulSoup
import requests
import pandas as pd
url = 'https://www.google.com/search?num=250&q=Apple innovation performance&oq=Apple innovation performance=1600&source=lnt&tbs=cdr:1,cd_min:1/1/2018,cd_max:12/31/2018&tbm=nws&hl=en-US'
response = requests.get(url,headers = {'User-Agent': 'Mozilla/5.0'}, cookies={'CONSENT':'YES '})
data = []
soup = BeautifulSoup(response.text)
for e in soup.select('a:has(h3)'):
data.append({
'title': e.h3.get_text(),
'date': e.span.get_text(),
'excerpt':e.br.previous,
'site': e.h3.find_next_sibling('div').get_text() if e.h3.find_next_sibling('div') else None,
'url': e.get('href').strip('/url?q=')
})
pd.DataFrame(data)
Output
title | date | excerpt | site | url | |
---|---|---|---|---|---|
0 | Apple iPhone 14: Premium Smartphone with Innovation Issues | 4 days ago | The Apple iPhone 14 performs very well in our review and achieves top scores primarily in the performance, display, and camera categories. | NotebookCheck.net | https://www.notebookcheck.net/Apple-iPhone-14-Premium-Smartphone-with-Innovation-Issues.661772.0.html&sa=U&ved=2ahUKEwicrvmP7PD6AhVPDuwKHQqAA5IQxfQBegQIBxAC&usg=AOvVaw3iLsrux3Epc_tFQbj7MUh1 |
1 | Improvement over innovation: are Apple's latest products a letdown? | 16 hours ago | Improvement over innovation: are Apple's latest products a letdown? ... for those willing to pay more for Pro models, improved performance. | The Oxford Student | https://www.oxfordstudent.com/2022/10/20/improvement-over-innovation-are-apples-latest-products-a-letdown/&sa=U&ved=2ahUKEwicrvmP7PD6AhVPDuwKHQqAA5IQxfQBegQIYxAC&usg=AOvVaw1gwOjOUJox68KMeiPj_CR8 |
2 | Apple iPad Pro: New generation with M2 chip, Wi-Fi 6E and more | 2 days ago | A 10-core GPU with up to 35 percent faster graphics performance ... Otherwise, the innovations of the new Apple iPad Pro are in the details. | Basic Tutorials | https://basic-tutorials.com/news/apple-ipad-pro-new-generation-with-m2-chip-wi-fi-6e-and-more/&sa=U&ved=2ahUKEwicrvmP7PD6AhVPDuwKHQqAA5IQxfQBegQIYhAC&usg=AOvVaw07b023WuvT4kB1XoO2-yla |
3 | Apple ditched Intel and never looked back — are other laptop ... | 15 hours ago | Called the M1 chip, Apple stuffed it inside a Mac Mini, ... products with the most extraordinary performance, most innovative technology,... | Laptop Mag | https://www.laptopmag.com/features/apple-ditched-intel-and-never-looked-back-are-other-laptop-makers-doing-the-same&sa=U&ved=2ahUKEwicrvmP7PD6AhVPDuwKHQqAA5IQxfQBegQIYRAC&usg=AOvVaw3IjNFjvFz_B3B3Xzu767gh |
4 | Chinese supply chain able to cope with impact of potential Apple ... | 1 day ago | One of Apple's manufacturers in China has been instructed to immediately ... assessment of the iPhone 14 series' performance in the Chinese... | Global Times | https://www.globaltimes.cn/page/202210/1277497.shtml&sa=U&ved=2ahUKEwicrvmP7PD6AhVPDuwKHQqAA5IQxfQBegQIXhAC&usg=AOvVaw2dUujWfIkidI1-TpdWsqvh |
...