Home > Enterprise >  Spider in Django views
Spider in Django views

Time:01-04

I want to use scrapy spider in Django views and I tried using CrawlRunner and CrawlProcess but there are problems, views are synced and further crawler does not return a response directly

I tried a few ways:

# Core imports.
from scrapy.crawler import CrawlerProcess
from scrapy.utils.project import get_project_settings

# Third-party imports.
from rest_framework.views import APIView
from rest_framework.response import Response

# Local imports.
from scrapy_project.spiders.google import GoogleSpider


class ForFunAPIView(APIView):
    def get(self, *args, **kwargs):
        process = CrawlerProcess(get_project_settings())
        process.crawl(GoogleSpider)
        process.start()
        return Response('ok')

is there any solution to handle that and run spider directly in other scripts or projects without using DjangoItem pipeline?

CodePudding user response:

you didn't really specify what the problems are, however, I guess the problem is that you need to return the Response immediately, and leave the heavy call aka function to run in the background, you can alter your code as following, to use the Threading module

from threading import Thread

class ForFunAPIView(APIView):
    def get(self, *args, **kwargs):

        process = CrawlerProcess(get_project_settings())
        process.crawl(GoogleSpider)

        thread = Thread(target=process.start)
        thread.start()
        
        return  Response('ok')

CodePudding user response:

after a while of searching for this topic, I found a good explanation here: Building a RESTful Flask API for Scrapy

  • Related