GitHub allows you to send no more than 2500 requests per hour if I have several accounts/tokens, how to set up an automatic token change in Scrapy when a certain level of requests is reached (for example 2500) or for the token to change when responding 403.?
class GithubSpider(scrapy.Spider):
name = 'github.com'
start_urls = ['https://github.com']
tokens = ['token1', 'token2', 'token3', 'token4']
headers = {
'Accept': 'application/vnd.github.v3 json',
'Authorization': 'token ' tokens[1],
}
def start_requests(self, **cb_kwargs):
for lang in languages:
cb_kwargs['lang'] = lang
url = f'https://api.github.com/search/users?q=language:{lang} location:{country}&per_page=100'
yield Request(url=url, headers=self.headers, callback=self.parse, cb_kwargs=cb_kwargs)
CodePudding user response:
You could use the cycle
function from the module itertools
to create a generator using your list of tokens that you can then cycle through for each request you send to ensure you are using all the tokens equally thereby reducing chance of reaching the limit for any of the tokens.
If you start receiving 403
responses then you will know that all the tokens have reached their limit. See sample code below
from itertools import cycle
class GithubSpider(scrapy.Spider):
name = 'github.com'
start_urls = ['https://github.com']
tokens = cycle(['token1', 'token2', 'token3', 'token4'])
def start_requests(self, **cb_kwargs):
for lang in languages:
headers = {
'Accept': 'application/vnd.github.v3 json',
'Authorization': 'token ' next(self.tokens)
}
cb_kwargs['lang'] = lang
url = f'https://api.github.com/search/users?q=language:{lang} location:{country}&per_page=100'
yield Request(url=url, headers=headers, callback=self.parse, cb_kwargs=cb_kwargs)