I'm trying to scrape some data from a website that detects live fotball odds drop and if there is a specific change in the HTML of the page,it will send me a notification to a Telegram bot that I've made..here is my code:
from distutils.command.clean import clean
import time
import requests
from bs4 import BeautifulSoup as bs
import urllib3
urllib3.disable_warnings(urllib3.exceptions.InsecureRequestWarning)
ids_list=[]
game_urls=[]
game_name=[]
gfix=[]
livecapper_url ="https://livecapper.ru/bet365/" #the website link
while(True):
page=requests.get(livecapper_url,verify=False).text
soup = bs(page , "html.parser")
game_ids = soup.find_all(game_id=True) #getting the IDs of every football game
for g in game_ids:
x=g.get('game_id')
ids_list.append(x) #putting the IDs on a list
for id in ids_list:
game_url = f"https://livecapper.ru/bet365/event.php?id={id}" #the URL of every single football game
game_urls.append(game_url)
for g in game_urls:
response=requests.get(g).text
soup = bs(response, "html.parser")
for t in soup.find_all("td",class_=['red1','red2','red3'], limit=1): #detecting the change in HTML
for g in soup.find_all("h1"):
game_name.append(g.get_text()) if g.get_text() not in game_name else game_name
for f in game_name:
game_url= 'https://api.telegram.org/botTOKEN/sendMessage?chat_id=-609XXXXXX&text=Fixed Alert : {}'.format(f) #sending notification to telegram bot
if game_url not in gfix:
gfix.append(game_url)
requests.get(game_url)
else:
pass
ids_list.clear
game_name.clear
game_urls.clear
time.sleep(1)
As you can see I'm using the While (True):
method to run the code 24/7 but the problem is that each iteration lasts twice as long as the previous one approximately .
e.g. 1st iteration=10s | 2nd iteration=20s | 3rd iteration=40s | 4th iteration=80s
What can I do to make all the iterations work as fast as possible?
CodePudding user response:
Change these:
ids_list.clear
game_name.clear
game_urls.clear
to:
ids_list.clear()
game_name.clear()
game_urls.clear()
Without the parentheses, you aren't calling the methods, but are merely accessing them and then discarding them (i.e., it does nothing).
CodePudding user response:
There's quite a few issues with the code, but ultimately the reason it takes longer each time is you continue to append to your lists, so after each iteration that list will grow bigger and bigger (included with duplicates). There's a few things you could do:
- Put those initial empty list within your loop
- remove duplicates from the list so it's not requesting the same thing multiple times in each iteration
- Correctly use
.clear()
I simply did 1, since what it looks like you want is to start each iteration with a clear list.
from distutils.command.clean import clean
import time
import requests
from bs4 import BeautifulSoup as bs
import urllib3
urllib3.disable_warnings(urllib3.exceptions.InsecureRequestWarning)
gfix=[]
livecapper_url ="https://livecapper.ru/bet365/" #the website link
while(True):
ids_list=[]
game_urls=[]
game_name=[]
page=requests.get(livecapper_url,verify=False).text
soup = bs(page , "html.parser")
game_ids = soup.find_all(game_id=True) #getting the IDs of every football game
for g in game_ids:
x=g.get('game_id')
ids_list.append(x) #putting the IDs on a list
for id in ids_list:
game_url = f"https://livecapper.ru/bet365/event.php?id={id}" #the URL of every single football game
game_urls.append(game_url)
for g in game_urls:
response=requests.get(g).text
soup = bs(response, "html.parser")
for t in soup.find_all("td",class_=['red1','red2','red3'], limit=1): #detecting the change in HTML
for g in soup.find_all("h1"):
game_name.append(g.get_text()) if g.get_text() not in game_name else game_name
for f in game_name:
game_url= 'https://api.telegram.org/botTOKEN/sendMessage?chat_id=-609XXXXXX&text=Fixed Alert : {}'.format(f) #sending notification to telegram bot
if game_url not in gfix:
gfix.append(game_url)
requests.get(game_url)
else:
pass
time.sleep(1)