I want to extract the title and the link out of the bs4.element.ResultSet into a pandas dataframe:
Code:
user_agent = 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/85.0.4183.121 Safari/537.36'
config = Config()
config.browser_user_agent = user_agent
user_input = "Solarpanels"
site = f'https://news.google.com/rss/search?q={user_input} when:14d&hl=en-GB&gl=DE&ceid=GB:en'
op = urlopen(site)
rd = op.read()
sp_page = soup(rd, 'xml')
news_list = sp_page.find_all('item')
print(type(news_list))
print(news_list)
Output:
<class 'bs4.element.ResultSet'>
[<item><title>Australian research finds cost-effective way to recycle solar panels - The Guardian</title><link>https://www.theguardian.com/environment/2022/oct/16/australian-research-finds-cost-effective-way-to-recycle-solar-panels</link><guid isPermaLink="false">1605236140</guid><pubDate>Sat, 15 Oct 2022 23:51:00 GMT</pubDate><description><ol><li><a href="https://www.theguardian.com/environment/2022/oct/16/australian-research-finds-cost-effective-way-to-recycle-solar-panels" target="_blank">Australian research finds cost-effective way to recycle solar panels</a>&nbsp;&nbsp;<font color="#6f6f6f">The Guardian</font></li><li><a href="https://www.techjuice.pk/australian-researchers-find-cost-effective-way-to-recycle-solar-panels/" target="_blank">Australian Researchers Find Cost-Effective Way To Recycle Solar Panels</a>&nbsp;&nbsp;<font color="#6f6f6f">TechJuice</font></li><li><a href="https://www.esi-africa.com/industry-sectors/business-and-markets/how-could-recycling-solar-panels-be-scaled-up-for-sustainable-effect/" target="_blank">How could recycling solar panels be scaled up for sustainable effect</a>&nbsp;&nbsp;<font color="#6f6f6f">ESI Africa</font></li><li><a href="https://www.digitaljournal.com/pr/solar-panel-recycling-market-to-rise-at-37-cagr-during-forecast-period-tmr-study" target="_blank">Solar Panel Recycling Market to Rise at 37% CAGR during Forecast Period: TMR Study</a>&nbsp;&nbsp;<font color="#6f6f6f">Digital Journal</font></li><li><strong><a href="https://news.google.com/stories/CAAqNggKIjBDQklTSGpvSmMzUnZjbmt0TXpZd1NoRUtEd2lzNjdmOUJSR3NNT0h4Y0h5dF9TZ0FQAQ?oc=5" target="_blank">View Full coverage on Google News</a></strong></li></ol></description><source url="https://www.theguardian.com">The Guardian</source></item>
... and much more
I tried a lot, but unfortunately I can't make it.
CodePudding user response:
Try:
import requests
import pandas as pd
from bs4 import BeautifulSoup
headers = {
"User-Agent": "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/85.0.4183.121 Safari/537.36"
}
user_input = "Solarpanels"
site = f"https://news.google.com/rss/search?q={user_input} when:14d&hl=en-GB&gl=DE&ceid=GB:en"
soup = BeautifulSoup(requests.get(site, headers=headers).content, "xml")
all_data = []
for item in soup.select("item"):
all_data.append(
{
"title": item.title.text,
"link": item.link.text,
"pubDate": item.pubDate.text,
"description": BeautifulSoup(
item.description.text, "html.parser"
).get_text(strip=True), # or .get_text(strip=True, separator=" ")
"source": item.source.text,
"source_url": item.source["url"],
}
)
df = pd.DataFrame(all_data)
print(df.head().to_markdown(index=False))
Prints:
title | link | pubDate | description | source | source_url |
---|---|---|---|---|---|
Australian research finds cost-effective way to recycle solar panels - The Guardian | https://www.theguardian.com/environment/2022/oct/16/australian-research-finds-cost-effective-way-to-recycle-solar-panels | Sat, 15 Oct 2022 23:51:00 GMT | Australian research finds cost-effective way to recycle solar panelsThe GuardianAustralian Researchers Find Cost-Effective Way To Recycle Solar PanelsTechJuiceHow could recycling solar panels be scaled up for sustainable effectESI AfricaSolar Panel Recycling Market to Rise at 37% CAGR during Forecast Period: TMR StudyDigital JournalView Full coverage on Google News | The Guardian | https://www.theguardian.com |
Business Matters: Solar Panels on Commercial Property: Why You Should Make the Switch - Insider Media | https://www.insidermedia.com/blogs/north-west/business-matters-solar-panels-on-commercial-property-why-you-should-make-the-switch | Mon, 17 Oct 2022 09:13:35 GMT | Business Matters: Solar Panels on Commercial Property: Why You Should Make the SwitchInsider Media | Insider Media | https://www.insidermedia.com |
Cost of living: The people using solar panels and turbines to reduce bills - bbc.co.uk | https://www.bbc.co.uk/news/uk-england-essex-62967716 | Wed, 05 Oct 2022 07:00:00 GMT | Cost of living: The people using solar panels and turbines to reduce billsbbc.co.uk | bbc.co.uk | https://www.bbc.co.uk |
School applies for 120 solar panels - Stamford Mercury | https://www.stamfordmercury.co.uk/news/school-applies-for-120-solar-panels-9278921/ | Mon, 17 Oct 2022 11:00:00 GMT | School applies for 120 solar panelsStamford Mercury | Stamford Mercury | https://www.stamfordmercury.co.uk |
Solar panels enable Lanarkshire village hall to cut running costs by 80 per cent - Daily Record | https://www.dailyrecord.co.uk/in-your-area/lanarkshire/solar-panels-enable-lanarkshire-village-28211459 | Sun, 16 Oct 2022 18:50:00 GMT | Solar panels enable Lanarkshire village hall to cut running costs by 80 per centDaily Record | Daily Record | https://www.dailyrecord.co.uk |