Home > Software engineering >  How to automate scrapy
How to automate scrapy

Time:11-12

I've got a little problem with my scrapy spider. So I set up scrapy and all is working fine but everytime I want to scrape a website I have to start the spider by myself. But I want it to be full automated and doesn´t know how to do.

Actually I start the spider with cmdline.execute. I thought I could simply write a while True loop but turns out it doesn´t work. And i found out, that the spider doesn´t really quit. Hard to explain. Pycharm says "Finished with exit code 0" but if i put a print("End of program") after the cmdline.execute it doesnt print out anything.

And at this point I'm confused what to do. Can you help me?

CodePudding user response:

There are many options for scheduling spiders.

CRON: Like Alexander commented you can create a CRON Job, it think this is best suited for a situation where you have just a few spiders that you're not gonna change the schedule for often.

Scrapydweb: It's a web interface for managing scrapyd. You must host it yourself. Quite easy to use in my experience.

Zyte: Practially the same as scrapydweb but it's a SaaS app that you do not host yourself. Very easy to use but expensive.

Gerapy: I have not tried it but I believe it's similar to scrapydweb but seems to be built on some more modern frameworks.

CodePudding user response:

Try using scrapyd.

Scrapyd is an application for deploying and running Scrapy spiders. It enables you to deploy (upload) your projects and control their spiders using a JSON API.

Some tutorials:

How to deploy scrapy spider using scrapyd?

Deploy, Schedule & Run Your Scrapy Spiders

  • Related