As far as I can see when the language button is pressed, this website https://www.learnit.nl/ fetches the english version by sending a POST Request to https://cdn-api-weglot.com/translate?api_key=wg_6199f2422428fc4285eb776a1ab915c08&v=1 and I dont know how to replicate with Scrapy. I'll appreciate any help.
CodePudding user response:
Data is in API calls json response with post method where payload is a big json and how to replicate with Scrapy, you can follow the next example:
import json
import scrapy
class CourseSpider(scrapy.Spider):
name = 'course'
body = add payload here
def start_requests(self):
yield scrapy.Request(
url='https://cdn-api-weglot.com/translate?api_key=wg_6199f2422428fc4285eb776a1ab915c08&v=1',
callback=self.parse,
body=json.dumps(self.body),
method="POST",
headers={
}
)
def parse(self, response):
response = json.loads(response.body)
for resp in response['to_words']:
yield {
'course': resp
}
Output:
{'course': 'Writing clear texts'}
2022-04-28 22:03:21 [scrapy.core.scraper] DEBUG: Scraped from <200 https://cdn-api-weglot.com/translate?api_key=wg_6199f2422428fc4285eb776a1ab915c08&v=1>
{'course': 'HTML e-mail'}
2022-04-28 22:03:21 [scrapy.core.scraper] DEBUG: Scraped from <200 https://cdn-api-weglot.com/translate?api_key=wg_6199f2422428fc4285eb776a1ab915c08&v=1>
{'course': 'HTML and CSS Basics'}
2022-04-28 22:03:21 [scrapy.core.scraper] DEBUG: Scraped from <200 https://cdn-api-weglot.com/translate?api_key=wg_6199f2422428fc4285eb776a1ab915c08&v=1>
{'course': 'HTML and CSS Continued'}
2022-04-28 22:03:21 [scrapy.core.scraper] DEBUG: Scraped from <200 https://cdn-api-weglot.com/translate?api_key=wg_6199f2422428fc4285eb776a1ab915c08&v=1>
{'course': 'HTML Training E-learning'}
'downloader/response_status_count/200': 1,
'elapsed_time_seconds': 1.879555,
'finish_reason': 'finished',
'finish_time': datetime.datetime(2022, 4, 28, 16, 3, 22, 536326),
'httpcompression/response_bytes': 36269,
'httpcompression/response_count': 1,
'item_scraped_count': 514,
... so on
As payload is a big json and can't post here as outof limit. Full working code here