Home > Software design >  How to avoid error while running scheduler for web-scraping?
How to avoid error while running scheduler for web-scraping?

Time:11-30

I need to collect some data via scraping (on legal basis), so now doing testing of the script on my own landing page. The goal is to take particular text in tags (in my example its just one sentence) once in 3 hours. I am testing the code for every 1 second (so expecting to see 5 lines of "СОСТАВЛЯЕМ СМЕТЫ" for 5 seconds). But execution of the code writes the phrase only once and returns an error afterwards.

import schedule
import time
from urllib.request import urlopen
from bs4 import BeautifulSoup

mf = open("C:\\Users\\Admin\\Desktop\\huyandex.txt",'a')


def job():
    html = urlopen("https://smeta-spb.com/")
    #print(html.read())
    bsObj = BeautifulSoup(html)
    nameList = bsObj.findAll("h1")
    #print(len(nameList))
    for name in nameList:
        mf.write(name.get_text())
        mf.write('\n')
    mf.close()
    
schedule.every(5).seconds.do(job)

while True:
    schedule.run_pending()
    time.sleep(1)

but error kicks in:

I/O operation on closed file.

How the code can be transformed so I could write things into the file?

CodePudding user response:

You can simply open a file within the loop, remove the open statement from the starting of loop and move it inside the so your code goes like:

import schedule
import time
from urllib.request import urlopen
from bs4 import BeautifulSoup



def job():
    mf=open("huyandex.txt",'a') # moved it inside the function
    html = urlopen("https://smeta-spb.com/")
    #print(html.read())
    bsObj = BeautifulSoup(html)
    nameList = bsObj.findAll("h1")
    #print(len(nameList))
    for name in nameList:
        mf.write(name.get_text())
        mf.write('\n')
    mf.close()
    
schedule.every(5).seconds.do(job)

while True:
    schedule.run_pending()
    time.sleep(1)
  • Related