Home > Software engineering >  Scrapy - How does a request sent using requests library to an API differs from the request that is s
Scrapy - How does a request sent using requests library to an API differs from the request that is s

Time:11-30

I am a beginner at using Scrapy and I was trying to scrape this website https://directory.ntschools.net/#/schools which is using javascript to load the contents. So I checked the networks tab and there's an API address available https://directory.ntschools.net/api/System/GetAllSchools If you open this address, the data is in XML format. But when you check the response tab while inspecting the network tab, the data is there in json format.

I first tried using Scrapy, sent the request to the API address WITHOUT any headers and the response that it returned was in XML which was throwing JSONDecode error upon using json.loads(). So I used the header 'Accept' : 'application/json' and the response I got was in JSON. That worked well

import scrapy
import json
import requests

class NtseSpider_new(scrapy.Spider):
    name = 'ntse_new'
    header = {
        'Accept': 'application/json',
         'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/107.0.0.0 Safari/537.36 Edg/107.0.1418.56',
    }
    
    def start_requests(self):
        yield scrapy.Request('https://directory.ntschools.net/api/System/GetAllSchools',callback=self.parse,headers=self.header)



    def parse(self,response):
        data = json.loads(response.body) #returned json response

But then I used the requests module WITHOUT any headers and the response I got was in JSON too!

import requests

import json


res = requests.get('https://directory.ntschools.net/api/System/GetAllSchools')

js = json.loads(res.content) #returned json response

Can anyone please tell me if there's any difference between both the types of requests? Is there a default response format for requests module when making a request to an API? Surely, I am missing something? Thanks

CodePudding user response:

It's because Scrapy sets the Accept header to 'text/html,application/xhtml xml,application/xml ...'. You can see that from this.

I experimented and found that server sends a JSON response if the request has no Accept header.

  • Related