I need to create a web-scraping program where I pass a value from a list (i.e. an integer indicating the number of clicks) to a function, but if the function doesn' t succeed I need it to store this value and then re-run the function with these unsuccess values until they all succeed (or at least for a n number of trials). I only have a pseudo-code of what I'm thinking, since I'm not sure how to do this:
first_ind = [1,2,3,...]
error_ind = []
#here there may be a loop for n trials
for i in first_ind:
try:
some_scrape_function(i) #returning some list of success values
except:
error_ind.append(i)
#here I don' t know how to re-run the function over a list that is at every iteration potentially smaller,
#until potentially null.
while new_error_ind:
new_error_ind = []
for i in error_ind:
try:
some_scrape_function(i)
return success_list_i
except:
new_error_ind.append(i)
In this last part, how can I make sure function is re-runned until success is obtained for all values?
CodePudding user response:
Keep a list of failures and keep looping until they pass or the retries are exhausted:
import random
def simulate_scrape(n): # fail 20% of the time
if random.random() >= .8:
raise RuntimeError('failed')
return True
def do_scrape(indexes, tries):
success = []
unsuccessful = indexes.copy() # don't mutate passed-in list
# empty containers and zero values are treated as false in Python,
# so if there are items in the list and tries is not zero, process...
while unsuccessful and tries:
failed = []
for n in unsuccessful:
try:
simulate_scrape(n) # exception on failure as OP was
except RuntimeError:
failed.append(n)
else:
success.append(n)
unsuccessful = failed # transfer the failed list for next round
print(f'failed: {unsuccessful}')
tries -= 1
return success,unsuccessful
to_do = list(range(50)) # all indexes 0-49 start unsuccessful
passed,failed = do_scrape(to_do, 3)
print(f'FINAL {passed=}\n'
f' {failed=}\n')
Output sample runs:
failed: [3, 10, 13, 16, 21, 24, 27, 31]
failed: [3, 27]
failed: []
FINAL passed=[0, 1, 2, 4, 5, 6, 7, 8, 9, 11, 12, 14, 15, 17, 18, 19, 20, 22, 23, 25, 26, 28, 29, 30, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 10, 13, 16, 21, 24, 31, 3, 27]
failed=[]
failed: [0, 5, 6, 38, 40, 49]
failed: [0]
failed: []
FINAL passed=[1, 2, 3, 4, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 39, 41, 42, 43, 44, 45, 46, 47, 48, 5, 6, 38, 40, 49, 0]
failed=[]
failed: [2, 5, 9, 10, 14, 20, 23, 28, 49]
failed: [9, 10, 14, 23]
failed: [9]
FINAL passed=[0, 1, 3, 4, 6, 7, 8, 11, 12, 13, 15, 16, 17, 18, 19, 21, 22, 24, 25, 26, 27, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 2, 5, 20, 28, 49, 10, 14, 23]
failed=[9]