Home > database >  Web scraper does not update/loop properly
Web scraper does not update/loop properly

Time:07-12

I am trying to make a web scraper that refreshes infinitely every 5 seconds to update the output window with a new article with specific keywords when it is posted. However, this code only refreshes once when an article with the keywords is posted, and then stops refreshing after the first print, even when an article with new keywords is posted. I'm not sure on how to fix this loop to make sure it updates when the new article is added.

from bs4 import BeautifulSoup
import requests
import time
import webbrowser
from playsound import playsound
import winsound

xml_text = requests.get('https://nypost.com/feed/').text.lower()
soup = BeautifulSoup(xml_text, 'xml')

def find_new(oldlink_main,prev):
    for e in soup.select('item'):
        if ("and") in e.title.text or ("or") in e.title.text or ("man") in e.title.text or ("said") in e.title.text or ("is") in e.title.text or ("the") in e.title.text or ("on") in e.title.text:
            title = e.title.text
            url = e.link.text
            if url != oldlink_main and (url not in prev):
                oldlink = url
                print(title)
                print(e.link.text)
                webbrowser.open(url, new=2)
                winsound.PlaySound("notify.wav", winsound.SND_ALIAS)
            return url
        else:
            print("no")

if __name__ == '__main__':
    ol_news=""
    store_links=[]
    while True:
        newl_news=find_new(ol_news,store_links)
        ol_news=newl_news
        time_wait = 2
        time.sleep(time_wait * 1)

CodePudding user response:

Your code is not working because you only request the content of the webpage once outside of your while loop.

Put the request in your function like so:

...
def find_new(oldlink_main,prev):
    # your request to the webpage needs to be here...
    xml_text = requests.get('https://nypost.com/feed/').text.lower()
    soup = BeautifulSoup(xml_text, 'xml')

    for e in soup.select('item'):
        if ("and") in e.title.text or ("or") in e.title.text or ("man") in e.title.text or ("said") in e.title.text or ("is") in e.title.text or ("the") in e.title.text or ("on") in e.title.text:
            title = e.title.text
...
  • Related