I scraped some data from google news into a dataframe:
DataFrame:
df
title link pubDate description source source_url
0 Australian research finds cost-effective way t... https://news.google.com/__i/rss/rd/articles/CB... Sat, 15 Oct 2022 23:51:00 GMT Australian research finds cost-effective way t... The Guardian https://www.theguardian.com
1 Something New Under the Sun: Floating Solar Pa... https://news.google.com/__i/rss/rd/articles/CB... Tue, 18 Oct 2022 11:49:11 GMT Something New Under the Sun: Floating Solar Pa... Voice of America - VOA News https://www.voanews.com
2 Adapt solar panels for sub-Saharan Africa - Na... https://news.google.com/__i/rss/rd/articles/CB... Tue, 18 Oct 2022 09:06:41 GMT Adapt solar panels for sub-Saharan AfricaNatur... Nature.com https://www.nature.com
3 Cost of living: The people using solar panels ... https://news.google.com/__i/rss/rd/articles/CB... Wed, 05 Oct 2022 07:00:00 GMT Cost of living: The people using solar panels ... BBC https://www.bbc.co.uk
4 Business Matters: Solar Panels on Commercial P... https://news.google.com/__i/rss/rd/articles/CB... Mon, 17 Oct 2022 09:13:35 GMT Business Matters: Solar Panels on Commercial P... Insider Media https://www.insidermedia.com
... ... ... ... ... ... ...
What I want to do now is basically to iterate through the "link" column and summarize every article with NLTK and add the summary to a new column. Here is an example:
article = Article(df.iloc[4, 1]) #get the url from the link column
article.download()
article.parse()
article.nlp()
article = article.summary
print(article)
Output:
North WestGemma Cornwall, Head of Sustainability of Anderton Gables, looks into the benefit of solar panels.
And, with the cost of solar panels continually dropping, it is becoming increasingly affordable for commercial property owners.
Reduce your energy spendMost people are familiar with solar energy, but many are unaware of the significant financial savings that can be gained by installing solar panels in commercial buildings.
As with all things, there are pros and cons to weigh up when considering solar panels.
If you’re considering solar panels for your property, contact one of the Anderton Gables team, who can advise you on the best course of action.
I tried a little bit, but I couldn't make it work...
Thanks for your help!
CodePudding user response:
This will be a very slow solution with a for loop, but it might work for a small dataset. Iterating through all the links and then applying the transformations needed, and ultimately create a new column in the dataframe
summaries = []
for l in df['source_url'].values:
article = Article(l)
article.download()
article.parse()
article.nlp()
summaries.append(article.summary)
df['summaries'] = summaries
Or you could define a custom function and the use pd.apply
:
def get_description(x):
art = Article(x)
art.download()
art.parse()
art.nlp()
return art.summary
df['summary'] = df['source_url'].apply(get_description)