I am trying to scrape the information pertaining to the biblical commentaries off of a website. Below is the code I have made to do so. start_urls
is the link to the json file I am trying to scrape. I chose ['0']['father']['_id']
to get the name of the commenter, however, the following error occurs. What should I do?
Error: TypeError: list indices must be integers or slices, not str
Code:
import scrapy
import json
class catenaspider(scrapy.Spider): #spider to crawl the url
name = 'commentary' #name to be called in command terminal
start_urls = ['https://api.catenabible.com:8080/anc_com/c/mt/1/1?tags=["ALL"]&sort=def']
def parse(self,response):
data = json.loads(response.body)
yield from data['0']['father']['_id']```
CodePudding user response:
Read the documentation again.
import scrapy
class catenaspider(scrapy.Spider): # spider to crawl the url
name = 'commentary' # name to be called in command terminal
start_urls = ['https://api.catenabible.com:8080/anc_com/c/mt/1/1?tags=["ALL"]&sort=def']
def parse(self, response):
data = response.json()
yield {'id_father': data[0]['father']['_id']}
# if you want to get all the id's
# for d in data:
# yield {'id_father': d['father']['_id']}