PROBLEM
I need to import a function/method located in scrapy project #1 into a spider in scrapy project # 2 and use it in one of the spiders of project #2.
DIRECTORY STRUCTURE
For starters, here's my directory structure (assume these are all under one root directory):
/importables # scrapy project #1
/importables
/spiders
title_collection.py # take class functions defined from here
/alibaba # scrapy project #2
/alibaba
/spiders
alibabaPage.py # use them here
WHAT I WANT
As shown above, I am trying to get scrapy to:
- Run
alibabaPage.py
- From
title_collection.py
, import a class method namedsaveTitleInTitlesCollection
out of a class in that file namedTitleCollectionSpider
- I want to use
saveTitleInTitlesCollection
inside functions that are called in thealibabaPage.py
spider.
HOW IT'S GOING...
Here's what I've done so far at the top of alibabaPage.py
:
from importables.importables.spiders import saveTitleInTitlesCollection
nope. Fails and the error says
builtins.ModuleNotFoundError: No module named 'importables'
How can that be? That answer I got from this answer.
sys.path.append(os.path.join(os.path.dirname(__file__), '../..'))
Then, I did this...from importables.importables.spiders import saveTitleInTitlesCollection
- nope, Fails and I get the same error as the first attempt. Taken from this answer.
Re-reading the post in the link from answer #1, I realized the guy put the two files in the same directory, so, I tried doing that (making a copy of
title_collection.py
and putting it in like so:
/alibaba # scrapy project #2
/alibaba
/spiders
alibabaPage.py # use them here
title_collection.py # added this
- Well, that appeared to work but didn't in the end. This threw no errors...
from alibaba.spiders.title_collection import TitleCollectionSpiderAlibaba
Leading me to assume everything worked. I added a test function named testForImport
and tried importing it, ended up getting error: builtins.ModuleNotFoundError: No module named 'alibaba.spiders.title_collection.testForImport'; 'alibaba.spiders.title_collection' is not a package
Unfortunately, this wasn't actually achieving the goal of importing the class method I want to use, named
saveTitleInTitlesCollection
.I have numerous scrapy projects and want to really just have one project of spiders that I can just import into every other project with ease.
This is not that solution so, the quest for a true solution to importing a bunch of class methods from one scrapy project to many continues... can this even be done I wonder...
WAIT, this actually didn't work after all because when builtins.ModuleNotFoundError: No module named 'TitleCollectionSpiderAlibaba'
from alibaba.spiders.title_collection import testForImport
nope. This failed too.
But, this time it gave me slightly different error...
builtins.ImportError:
cannot import name 'testForImport' from 'alibaba.spiders.title_collection'
(C:\Users\User\\scrapy-webscrapers\alibaba\alibaba\spiders\title_collection.py)
Consider this solved
Due to Umair's answer I was able to do this:
# typical scrapy spider imports...
import scrapy
from ..items import AlibabaItem
# import this near the top of the page
sys.path.append(os.path.join(os.path.abspath('../')))
from importables.importables.spiders.title_collection import TitleCollectionSpider
...
# then, in parse method I did this...
def parse(self, response):
alibaba_item = AlibabaItem()
title_collection_spider_obj = TitleCollectionSpider()
title_collection_spider_obj.testForImportTitlesCollection()
# terminal showed this, proving it worked...
# "testForImport worked if you see this!"
CodePudding user response:
inside alibabaPage.py
you can do this to import class outside of your Scrapy project folder
import os, sys
sys.path.append(os.path.join(os.path.abspath('../')))
from importables.importables.spiders.title_collection import TitleCollectionSpider
This will import class from title_collection.py
into alibabaPage.py