How to automate scrapy-CodePudding

I've got a little problem with my scrapy spider. So I set up scrapy and all is working fine but everytime I want to scrape a website I have to start the spider by myself. But I want it to be full automated and doesn´t know how to do.

Actually I start the spider with cmdline.execute. I thought I could simply write a while True loop but turns out it doesn´t work. And i found out, that the spider doesn´t really quit. Hard to explain. Pycharm says "Finished with exit code 0" but if i put a print("End of program") after the cmdline.execute it doesnt print out anything.

And at this point I'm confused what to do. Can you help me?

CodePudding user response：

There are many options for scheduling spiders.

CRON: Like Alexander commented you can create a CRON Job, it think this is best suited for a situation where you have just a few spiders that you're not gonna change the schedule for often.

Scrapydweb: It's a web interface for managing scrapyd. You must host it yourself. Quite easy to use in my experience.

Zyte: Practially the same as scrapydweb but it's a SaaS app that you do not host yourself. Very easy to use but expensive.

Gerapy: I have not tried it but I believe it's similar to scrapydweb but seems to be built on some more modern frameworks.

CodePudding user response：

Try using scrapyd.

Scrapyd is an application for deploying and running Scrapy spiders. It enables you to deploy (upload) your projects and control their spiders using a JSON API.

Some tutorials:

How to deploy scrapy spider using scrapyd?

Deploy, Schedule & Run Your Scrapy Spiders