Home > Net >  Error when webscraping news from cnn using selenim and bs4
Error when webscraping news from cnn using selenim and bs4

Time:03-15

I wrote this code for now to webscrape news from a spacific topic from cnn:

from bs4 import BeautifulSoup
from selenium import webdriver
from selenium.webdriver.chrome.service import Service

serch_term = input('What News are you looking for today? ')

service = Service(executable_path='chromedriver.exe')
driver = webdriver.Chrome(service=service)
driver.get(f'https://edition.cnn.com/search?q={serch_term}')

soup = BeautifulSoup(driver.page_source,'html.parser' )
soup.select('h3.cnn-search__result-headline')

But its not working im getting this error after chrome pops up with the cnn site

DevTools listening on ws://127.0.0.1:65095/devtools/browser/05c3af16-cb5a-423c-af0b-c6cc96af980d
[11496:15920:0314/183947.010:ERROR:ssl_client_socket_impl.cc(995)] handshake failed; returned -1, SSL error code 1, net_error -200
PS C:\Users\user\Desktop\Informatik\Praktik\Projekte\Python\stiil_working_on\news_automation> [3408:22012:0314/183950.356:ERROR:device_event_log_impl.cc(214)] [18:39:50.360] USB: usb_device_handle_win.cc:1049 Failed to read descriptor from node connection: Ein an das System angeschlossenes Ger�t funktioniert nicht. (0x1F)
[3408:22012:0314/183950.356:ERROR:device_event_log_impl.cc(214)] [18:39:50.362] USB: usb_device_handle_win.cc:1049 Failed to read descriptor from node connection: Ein an das System angeschlossenes Ger�t funktioniert nicht. (0x1F)
[11496:15920:0314/183953.096:ERROR:ssl_client_socket_impl.cc(995)] handshake failed; returned -1, SSL error code 1, net_error -200
[15208:11512:0314/184146.206:ERROR:gpu_init.cc(440)] Passthrough is not supported, GL is disabled, ANGLE is 

CodePudding user response:

input fuction can't find search result and it raises error but general search is working. Please Just run the code.

from bs4 import BeautifulSoup
import time
from selenium import webdriver
from webdriver_manager.chrome import ChromeDriverManager

serch_term = 'News'

url = f'https://edition.cnn.com/search?q={serch_term}'
print(url)

driver = webdriver.Chrome(ChromeDriverManager().install())
driver.maximize_window()

driver.get(url)
time.sleep(4)

soup = BeautifulSoup(driver.page_source, 'html.parser')
#driver.close()

for h3 in soup.select('h3.cnn-search__result-headline > a'):
    title=h3.text
    print(title)

Output:

https://www.cnn.com/europe/live-news/ukraine-russia-putin-news-03-14-22/index.html
https://www.cnn.com/2022/03/14/energy/india-russia-oil/index.html
https://www.cnn.com/2022/03/14/us/new-york-city-washington-dc-homeless-shootings/index.html
https://www.cnn.com/2022/03/14/politics/breonna-taylor-mother-federal-charges-officers/index.html
https://www.cnn.com/2022/03/14/politics/biden-possible-european-trip/index.html
https://www.cnn.com/2022/03/07/world/what-we-know-brittney-griner-arrest-russia/index.html
https://www.cnn.com/2022/03/14/middleeast/mideast-summary-03-14-2022-intl/index.html
https://www.cnn.com/2022/03/14/energy/oil-prices/index.html
https://www.cnn.com/2022/03/14/tech/pete-davidson-blue-origin-launch-scn/index.html
https://www.cnn.com/2022/03/14/politics/donald-trump-south-carolina-speech/index.html
  • Related