Home > front end >  Scrapy DEBUG: Filtered offsite request
Scrapy DEBUG: Filtered offsite request

Time:01-17

allowed_domains = ['www.google.com','google.com',]
start_urls = ['https://www.google.com/search?q=mobiles&tbm=pts&sxsrf=AJOqlzXrlIIii_GtGMCheGMJHKPpQl1hLw:1673692348905&source=hp&ei=vITCY_2YNOKVxc8P79uA2A8&iflsig=AK50M_UAAAAAY8KSzHAkD8f8N_ul8boy27FJhuidI9c7&ved=0ahUKEwj95qrv7cb8AhXiSvEDHe8tAPsQ4dUDCAg&uact=5&oq=mobiles&gs_lcp=Cg9nd3Mtd2l6LXBhdGVudHMQAzIECCMQJzIFCAAQkQIyBAgAEEMyCggAEIAEEIcCEBQyCAgAEIAEELEDMggIABCABBCxAzILCAAQgAQQsQMQgwEyCAgAEIAEELEDMggIABCABBCxAzILCAAQgAQQsQMQyQM6CAgAELEDEIMBOgUIABCABDoFCAAQsQM6BQgAEJIDUABYygxg1g1oAHAAeACAAfADiAG4DpIBAzQtNJgBAKABAQ&sclient=gws-wiz-patents']

This is parse and other_link function

def parse(self, response):

        title = response.xpath("//div[@class='yuRUbf']/a/h3/text()").extract_first()
        realetd_data = response.xpath("//div[@class='yuRUbf']/a/@href").get()

       

        yield response.follow(url = realetd_data, callback = self.other_link)


    def other_link(self,response):
        heading = response.xpath("//div[@class='abstract style-scope patent-text']/text()").get()

        yield{
            'heading': heading
        }

I am getting this

DEBUG: Crawled (200) <GET https://www.google.com/search?q=mobiles&tbm=pts&sxsrf=AJOqlzXrlIIii_GtGMCheGMJHKPpQl1hLw:1673692348905&source=hp&ei=vITCY_2YNOKVxc8P79uA2A8&iflsig=AK50M_UAAAAAY8KSzHAkD8f8N_ul8boy27FJhuidI9c7&ved=0ahUKEwj95qrv7cb8AhXiSvEDHe8tAPsQ4dUDCAg&uact=5&oq=mobiles&gs_lcp=Cg9nd3Mtd2l6LXBhdGVudHMQAzIECCMQJzIFCAAQkQIyBAgAEEMyCggAEIAEEIcCEBQyCAgAEIAEELEDMggIABCABBCxAzILCAAQgAQQsQMQgwEyCAgAEIAEELEDMggIABCABBCxAzILCAAQgAQQsQMQyQM6CAgAELEDEIMBOgUIABCABDoFCAAQsQM6BQgAEJIDUABYygxg1g1oAHAAeACAAfADiAG4DpIBAzQtNJgBAKABAQ&sclient=gws-wiz-patents> (referer: None) 2023-01-14 16:43:26 [scrapy.spidermiddlewares.offsite] DEBUG: Filtered offsite request to 'www.google.com.pk': <GET https://www.google.com.pk/patents/WO2006010333A1?cl=en&dq=mobiles&hl=en&sa=X&ved=2ahUKEwiCmP_c_cb8AhW-qZUCHW4ZABYQ6AF6BAgFEAM> 2023-01-14 16:43:26 [scrapy.core.engine] INFO: Closing spider (finished) 2023-01-14 16:43:26 [scrapy.statscollectors] INFO: Dumping Scrapy stats:

Can You Please help me

CodePudding user response:

allowed_domains = ['www.google.com','google.com', ' https://www.google.com.pk']

This should work, you need to update allowed_domains

  • Related