Home > front end >  Scraping data from website that refreshes every 10 minutes in python
Scraping data from website that refreshes every 10 minutes in python

Time:08-27

I am very new to web scraping and python in general. I am working on a project that requires me to scrape data from a website that refreshes/updates data every 10 minutes. I was able to scrape the data for the current 10 minutes but when the data refreshes the previous data is not valid anymore. I need help with 3 things-

  1. There is an input time stamp at the top of the website. How can I change the time in that input to only fetch data for that particular time period? enter image description here

  2. My current code is -

    import requests
    import pandas as pd
    import datetime as dt
    from datetime import datetime
    
    URL1 = "URL.com"
    
    tables1= pd.read_html(URL1)
    
    print("There are : ",len(tables1)," tables1")
    
    PartUsage=pd.DataFrame(tables1[8])
    
    now=datetime.now()
    PartUsage["Date"]=now
    PartUsage.set_index("Date", inplace=True)
    
    from pathlib import Path  
    filepath = Path('Path.csv')  
    filepath.parent.mkdir(parents=True, exist_ok=True)  
    PartUsage.to_csv(filepath)

I added time stamp because there is no timestamp in the required table. How can I link the time stamp to use that as an input?

This is company specific data and hence I cannot provide the link or any further details. Any help will be appreciated. Thank you

CodePudding user response:

You can use Cron app for this. This is an application, that runs some scripts on a specific schedule. You can also deploy it in an auto-running docker container for convenience. More about cron, you can find there: How do I get a Cron like scheduler in Python?

  • Related