Home > Enterprise >  Using a Python websocket server as an async generator
Using a Python websocket server as an async generator

Time:01-26

I have a scraper that requires the use of a websocket server (can't go into too much detail on why because of company policy) that I'm trying to turn into a template/module for easier use on other websites.

I have one main function that runs the loop of the server (e.g. ping-pongs to keep the connection alive and send work and stop commands when necessary) that I'm trying to turn into a generator that yields the HTML of scraped pages (asynchronously, of course). However, I can't figure out a way to turn the server into a generator.

This is essentially the code I would want (simplified to just show the main idea, of course):

import asyncio, websockets

needsToStart = False  # Setting this to true gets handled somewhere else in the script

async def run(ws):
    global needsToStart

    while True:
        data = await ws.recv()
        
        if data == "ping":
            await ws.send("pong")
        elif "<html" in data:
            yield data  # Yielding the page data

        if needsToStart:
            await ws.send("work")  # Starts the next scraping session
            needsToStart = False

generator = websockets.serve(run, 'localhost', 9999)

while True:
    html = await anext(generator)

    # Do whatever with html

This, of course, doesn't work, giving the error "TypeError: 'Serve' object is not callable". But is there any way to set up something along these lines? An alternative I could try is creating an 'intermittent' object that holds the data which the end loop awaits, but that seems messier to me than figuring out a way to get this idea to work.

Thanks in advance.

CodePudding user response:

I found a solution that essentially works backwards, for those in need of the same functionality: instead of yielding the data, I pass along the function that processes said data. Here's the updated example case:

import asyncio, websockets
from functools import partial

needsToStart = False  # Setting this to true gets handled somewhere else in the script


def process(html):
    pass


async def run(ws, htmlFunc):
    global needsToStart

    while True:
        data = await ws.recv()
        
        if data == "ping":
            await ws.send("pong")
        elif "<html" in data:
            htmlFunc(data)  # Processing the page data

        if needsToStart:
            await ws.send("work")  # Starts the next scraping session
            needsToStart = False

func = partial(run, htmlFunc=process)

websockets.serve(func, 'localhost', 9999)
  • Related