Home > Enterprise >  Python/Pandas/NLTK: Iterating through a DataFrame, get value, transform it and add the new value to
Python/Pandas/NLTK: Iterating through a DataFrame, get value, transform it and add the new value to

Time:10-20

I scraped some data from google news into a dataframe:

DataFrame:

df

title   link    pubDate     description     source  source_url
0   Australian research finds cost-effective way t...   https://news.google.com/__i/rss/rd/articles/CB...   Sat, 15 Oct 2022 23:51:00 GMT   Australian research finds cost-effective way t...   The Guardian    https://www.theguardian.com
1   Something New Under the Sun: Floating Solar Pa...   https://news.google.com/__i/rss/rd/articles/CB...   Tue, 18 Oct 2022 11:49:11 GMT   Something New Under the Sun: Floating Solar Pa...   Voice of America - VOA News     https://www.voanews.com
2   Adapt solar panels for sub-Saharan Africa - Na...   https://news.google.com/__i/rss/rd/articles/CB...   Tue, 18 Oct 2022 09:06:41 GMT   Adapt solar panels for sub-Saharan AfricaNatur...   Nature.com  https://www.nature.com
3   Cost of living: The people using solar panels ...   https://news.google.com/__i/rss/rd/articles/CB...   Wed, 05 Oct 2022 07:00:00 GMT   Cost of living: The people using solar panels ...   BBC     https://www.bbc.co.uk
4   Business Matters: Solar Panels on Commercial P...   https://news.google.com/__i/rss/rd/articles/CB...   Mon, 17 Oct 2022 09:13:35 GMT   Business Matters: Solar Panels on Commercial P...   Insider Media   https://www.insidermedia.com
...     ...     ...     ...     ...     ...     ...

What I want to do now is basically to iterate through the "link" column and summarize every article with NLTK and add the summary to a new column. Here is an example:

article = Article(df.iloc[4, 1]) #get the url from the link column
article.download()
article.parse()
article.nlp()
article = article.summary
print(article)

Output:

North WestGemma Cornwall, Head of Sustainability of Anderton Gables, looks into the benefit of solar panels.
And, with the cost of solar panels continually dropping, it is becoming increasingly affordable for commercial property owners.
Reduce your energy spendMost people are familiar with solar energy, but many are unaware of the significant financial savings that can be gained by installing solar panels in commercial buildings.
As with all things, there are pros and cons to weigh up when considering solar panels.
If you’re considering solar panels for your property, contact one of the Anderton Gables team, who can advise you on the best course of action.

I tried a little bit, but I couldn't make it work...

Thanks for your help!

CodePudding user response:

This will be a very slow solution with a for loop, but it might work for a small dataset. Iterating through all the links and then applying the transformations needed, and ultimately create a new column in the dataframe

summaries = []
for l in df['source_url'].values:
    article = Article(l)
    article.download()
    article.parse()
    article.nlp()
    summaries.append(article.summary)
df['summaries'] = summaries

Or you could define a custom function and the use pd.apply:

def get_description(x):
    art = Article(x)
    art.download()
    art.parse()
    art.nlp()
    return art.summary

df['summary'] = df['source_url'].apply(get_description)
  • Related