I have run into a problem when designing my software.
My software consists of a few classes, Bot
, Website
, and Scraper
.
Bot
is the most abstract, executive class responsible for managing the program at a high-level.
Website
is a class which contains scraped data from that particular website.
Scraper
is a class which may have multiple instances per Website
. Each instance is responsible for a different part of a single website.
Scraper
has a function scrape_data()
which returns the JSON data associated with the Website
. I want to pass this data into the Website
somehow, but can't find a way since Scraper
sits on a lower level of abstraction. Here's the ideas I've tried:
# In this idea, Website would have to poll scraper. Scraper is already polling Server, so this seems messy and inefficient
class Website:
def __init__(self):
self.scrapers = list()
self.data = dict()
def add_scraper(self, scraper):
self.scrapers.append(scraper)
def add_data(type, json):
self.data[type] = json
...
# The problem here is scraper has no awareness of the dict of websites. It cannot pass the data returned by Scraper into the respective Website
class Bot:
def __init__(self):
self.scrapers = list()
self.websites = dict()
How can I solve my problem? What sort of more fundamental rules or design patterns apply to this problem, so I can use them in the future?
CodePudding user response:
One way to go about this is, taking inspiration from noded structures, to have an atribute in the Scraper
class that directly references its respective Website
, as if I'm understanding correctly you described a one-to-many relationship (one Website
can have multiple Scrapers
). Then, when a Scraper
needs to pass its data to its Website
, you can reference directly said atribute:
class Website:
def __init__(self):
self.scrapers = list() #You can indeed remove this list of scrapers since the
#scrapper will reference its master website, not the other way around
self.data = dict() #I'm not sure how you want the data to be stores,
#it could be a list, a dict, etc.
def add_scraper(self, scraper):
self.scrapers.append(scraper)
def add_data(type, json):
self.data[type] = json
class Scraper:
def __init__(self, master_website):
#Respective code
self.master = master_website #This way you have a direct reference to the website.
#This master_website is a Website object
...
def scrape_data(self):
json = #this returns the scraped data in JSON format
self.master.add_data(type, json)
I don't know how efficient this would be or if you want to know at any moment which scrapers are linked to which website, though
CodePudding user response:
As soon as you start talking about a many-to-many parent/child relationship, You should be thinking about compositional patterns rather than traditional inheritance. Specifically, the Decorator Pattern. Your add_scraper
method is kind of a tipoff that you're essentially looking to build a handler-stack.
The classic example for this pattern is a set of classes responsible for producing the price of a coffee. You start with a base component "coffee", and you have one class per ingredient, each with its own price modifier. A class for whole milk, one for skim, one for sugar, one for hazelnut syrup, one for chocolate, etc. And all the ingredients as well as the base components share an interface that guarantees the existence of a 'getPrice' method. As the user places their order, the base component gets injected into the first ingredient/wrapper-class. The wrapped object gets injected into subsequent ingredient-wrappers and so-on, until finally getPrice
is called. And each instance of getPrice
should be written to first pull from the previously injected one, so the calculation reaches throughout all layers.
The benefits are that new ingredients can be added without impacting the existing menu, existing ones can have their price changed in isolation, and ingredients can be added to multiple types of drinks.
In your case, the data-struct being decorated is the Website
object. The ingredient classes would be your Scrapers
, and the getPrice
method would be scrape_data
. And the scrape_data
method should expect to receive an instance of Website
as a parameter, and return it after hydration. Each Scraper needs no awareness of how the other scrapers work, or which ones to implement. All it needs to know is that a previous one exists and adheres to an interface guaranteeing that it too has a scrape_data
method. And all will ultimately be manipulating the same Website
object, so that what gets spit back out to your Bot
has been hydrated by all of them.
This puts the onus of knowing what Scrapers to apply to what Website
on your Bot
class, which is essentially now a Service class. Since it lives in the upper abstraction layer, it has the high-level perspective needed to know this.