QUESTION
Before you get started, explore the website https://www.abc.net.au/news/justin
We are interested in the titles of the and hyperlinks to the news items in the Just in section. More specifically, that 's the section that is highlighted in the following image:
The content on The page has obviously had been updated since this exercise was set up. Still, my The exact content changed, The principle stays The same.
What you need to do is:
Scrape all the titles (1), (2) the physicist hyperlinks, and the (3) the descriptions of the news items in the highlighted section of the ash in the page, in the historic example in the screenshot that whenever 'Serious grounds to be concerned about... ', 'Tourism sandbox: Phuket... ', etc. For the titles, the urls you are directed to an order to click those titles, and the descriptions' Allies of jailed... ', 'Thailand' s resort mixes island... ', etc., that belong to the titles.
Save the information into a CSV file named 'abcnews. CSV' that contains three variables: 'title', 'url', and 'descriptions. One row for each article, combining the title, the hyperlink, and the description for that article.
At present as such could not achieve the
The from urllib. Request the import request, urlopen
The import SSL
The from bs4 import BeautifulSoup
The import pandas as pd
Url='https://www.abc.net.au/news/justin'
# # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # #
# # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # #
# # #
Headers={' the user-agent ':' Mozilla/5.0 (Macinstosh; Intel Mac OS X 10 _10_1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/39.0.2171.95 Safari/537.36 '}
The req=Request (url, headers=headers)
The context=SSL. _create_unverified_context ()
UClient=urlopen (the req, context=context)
HTML=uClient. Read ()
UClient. Close ()
# # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # #
# # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # #
Soup=BeautifulSoup (HTML, '. The HTML parser)
Divofinterest=soup. Find_all (' div 'class_=' _3OXQ1 _26IxR _3bGVu ')
The dataset=[]
For the item in divofinterest (' a ') :
Title=item. The find (" p "). The getText ()
Url=item [' href ']
Print (the title)
Print (url)
Print ()
The dataset. Append ({' title: the title, 'url: url})
The dataset=pd. DataFrame (dataset)
The dataset. To_csv (' abcnews. CSV, sep='; ', the index=False)