I am very new to web scraping and python in general. I am working on a project that requires me to scrape data from a website that refreshes/updates data every 10 minutes. I was able to scrape the data for the current 10 minutes but when the data refreshes the previous data is not valid anymore. I need help with 3 things-
There is an input time stamp at the top of the website. How can I change the time in that input to only fetch data for that particular time period? enter image description here
My current code is -
import requests
import pandas as pd
import datetime as dt
from datetime import datetime
URL1 = "URL.com"
tables1= pd.read_html(URL1)
print("There are : ",len(tables1)," tables1")
PartUsage=pd.DataFrame(tables1[8])
now=datetime.now()
PartUsage["Date"]=now
PartUsage.set_index("Date", inplace=True)
from pathlib import Path
filepath = Path('Path.csv')
filepath.parent.mkdir(parents=True, exist_ok=True)
PartUsage.to_csv(filepath)
I added time stamp because there is no timestamp in the required table. How can I link the time stamp to use that as an input?
This is company specific data and hence I cannot provide the link or any further details. Any help will be appreciated. Thank you
CodePudding user response:
You can use Cron app for this. This is an application, that runs some scripts on a specific schedule. You can also deploy it in an auto-running docker container for convenience. More about cron, you can find there: How do I get a Cron like scheduler in Python?