I have one web scraping function which fetches data of 190 URL's. To complete it fast I used concurrent.future.Threadpool.executor. I am saving that data to SQL Server database. I have to do these all process repeatedly to every 3 mins from 9AM to 4PM. But when I use while loop or scheduler that concurrent future not works. No error and no output.
# required libraries
import request
urls = []
def data_fetched(url):
# data fetching
# operations on data
# data saving to SQL server
return ''
while True:
with concurrent.future.ThreadPool.executor() as executor:
executor.map(data_fetched, url)
time.sleep(60)
I want to repeat all these things to every 3 mins, explained flow of code. Please help me how to schedule it.
start = dt.strptime("09:15:00", "%H:%M:%S")
end = dt.strptime("15:30:00", "%H:%M:%S")
# min_gap
min_gap = 3
# compute datetime interval
arr = [(start timedelta(hours=min_gap*i/60)).strftime("%H:%M:%S")
for i in range(int((end-start).total_seconds() / 60.0 / min_gap))]
while True:
weekno = datetime.datetime.today().weekday()
now = dt.now() # gets current datetime
hour = str(now.hour) # gets current hour
minute = str(now.minute) # gets current minute
second = str(now.second)
current_time = f"{hour}:{minute}:{second}" # combines current hour and minute
# checks if current time is in the hours list
if weekno < 5 and current_time in arr:
print('data_loaded')
else: # 5 Sat, 6 Sun
pass
time.sleep(60)
So under these while loop I want to call that function using concurrent.futures.
CodePudding user response:
You can create a seperate function and schedule it to execute the data_fetched()
. I hope your urls
variable contains the list of urls and not empty list.
from schedule import every, repeat, run_pending
import time
import request
urls = []
def data_fetched(url):
# data fetching
# operations on data
# data saving to SQL server
return ''
@repeat(every(3).minutes)
def execute_script():
with concurrent.future.ThreadPool.executor() as executor:
executor.map(data_fetched, urls)
while True:
run_pending()
time.sleep(1)