OOP - How to pass data "up" in abstraction?-CodePudding

I have run into a problem when designing my software.

My software consists of a few classes, Bot, Website, and Scraper.

Bot is the most abstract, executive class responsible for managing the program at a high-level.

Website is a class which contains scraped data from that particular website.

Scraper is a class which may have multiple instances per Website. Each instance is responsible for a different part of a single website.

Scraper has a function scrape_data() which returns the JSON data associated with the Website. I want to pass this data into the Website somehow, but can't find a way since Scraper sits on a lower level of abstraction. Here's the ideas I've tried:

# In this idea, Website would have to poll scraper. Scraper is already polling Server, so this seems messy and inefficient
class Website:
    def __init__(self):
        self.scrapers = list()
        self.data = dict()

    def add_scraper(self, scraper):
        self.scrapers.append(scraper)
   
    def add_data(type, json):
        self.data[type] = json

    ...

# The problem here is scraper has no awareness of the dict of websites. It cannot pass the data returned by Scraper into the respective Website
class Bot:
     def __init__(self):
          self.scrapers = list()
          self.websites = dict()

How can I solve my problem? What sort of more fundamental rules or design patterns apply to this problem, so I can use them in the future?

CodePudding user response：

One way to go about this is, taking inspiration from noded structures, to have an atribute in the Scraper class that directly references its respective Website, as if I'm understanding correctly you described a one-to-many relationship (one Website can have multiple Scrapers). Then, when a Scraper needs to pass its data to its Website, you can reference directly said atribute:

class Website:
     def __init__(self):
         self.scrapers = list()   #You can indeed remove this list of scrapers since the 
                                   #scrapper will reference its master website, not the other way around
         self.data = dict()  #I'm not sure how you want the data to be stores, 
                             #it could be a list, a dict, etc.
     def add_scraper(self, scraper):
         self.scrapers.append(scraper)

     def add_data(type, json):
         self.data[type] = json

class Scraper:
     def __init__(self, master_website):
         #Respective code
         self.master = master_website  #This way you have a direct reference to the website.  
                                       #This master_website is a Website object
     ...
     def scrape_data(self):
          json = #this returns the scraped data in JSON format
          self.master.add_data(type, json)

I don't know how efficient this would be or if you want to know at any moment which scrapers are linked to which website, though

CodePudding user response：

As soon as you start talking about a many-to-many parent/child relationship, You should be thinking about compositional patterns rather than traditional inheritance. Specifically, the Decorator Pattern. Your add_scraper method is kind of a tipoff that you're essentially looking to build a handler-stack.

The classic example for this pattern is a set of classes responsible for producing the price of a coffee. You start with a base component "coffee", and you have one class per ingredient, each with its own price modifier. A class for whole milk, one for skim, one for sugar, one for hazelnut syrup, one for chocolate, etc. And all the ingredients as well as the base components share an interface that guarantees the existence of a 'getPrice' method. As the user places their order, the base component gets injected into the first ingredient/wrapper-class. The wrapped object gets injected into subsequent ingredient-wrappers and so-on, until finally getPrice is called. And each instance of getPrice should be written to first pull from the previously injected one, so the calculation reaches throughout all layers.

The benefits are that new ingredients can be added without impacting the existing menu, existing ones can have their price changed in isolation, and ingredients can be added to multiple types of drinks.

In your case, the data-struct being decorated is the Website object. The ingredient classes would be your Scrapers, and the getPrice method would be scrape_data. And the scrape_data method should expect to receive an instance of Website as a parameter, and return it after hydration. Each Scraper needs no awareness of how the other scrapers work, or which ones to implement. All it needs to know is that a previous one exists and adheres to an interface guaranteeing that it too has a scrape_data method. And all will ultimately be manipulating the same Website object, so that what gets spit back out to your Bot has been hydrated by all of them.

This puts the onus of knowing what Scrapers to apply to what Website on your Bot class, which is essentially now a Service class. Since it lives in the upper abstraction layer, it has the high-level perspective needed to know this.