Home > OS >  Scrapy not identifying key from json
Scrapy not identifying key from json

Time:08-31

I am trying to scrape the information pertaining to the biblical commentaries off of a website. Below is the code I have made to do so. start_urls is the link to the json file I am trying to scrape. I chose ['0']['father']['_id'] to get the name of the commenter, however, the following error occurs. What should I do?

Error: TypeError: list indices must be integers or slices, not str

Code:

import scrapy
import json

class catenaspider(scrapy.Spider): #spider to crawl the url
    name = 'commentary' #name to be called in command terminal
    start_urls = ['https://api.catenabible.com:8080/anc_com/c/mt/1/1?tags=["ALL"]&sort=def']

    def parse(self,response):
        data = json.loads(response.body)
        yield from data['0']['father']['_id']```

CodePudding user response:

Read the documentation again.

import scrapy


class catenaspider(scrapy.Spider):  # spider to crawl the url
    name = 'commentary' # name to be called in command terminal
    start_urls = ['https://api.catenabible.com:8080/anc_com/c/mt/1/1?tags=["ALL"]&sort=def']

    def parse(self, response):
        data = response.json()
        yield {'id_father': data[0]['father']['_id']}
        # if you want to get all the id's
        # for d in data:
        #     yield {'id_father': d['father']['_id']}
  • Related