Home > Back-end >  How to extract data from multiple pages which have date in their url?
How to extract data from multiple pages which have date in their url?

Time:10-06

I want to extract content from a website where the link is as follows:

"www.example.com/getpublicreport?date=2021-10-01"

Using Requests what should be the code to extract data from multiple pages where I could navigate using the date in url.

For example if I want to extract data from date - 2019-01-01 till the current data how do I write code using request library to get the data.

CodePudding user response:

www.example.com/getpublicreport?date=2021-10-01

This is example of URL with parameters, requests does have params where you should deliver dict with key-value pairs. You might access this as follows

import requests
url = "http://www.example.com/getpublicreport"
parameters = {"date": "2021-10-01"}
r = requests.get(url, params=parameters)
print(r.url)  # http://www.example.com/getpublicreport?date=2021-10-01

If you want to know more about URLs read RFC1738.

CodePudding user response:

Hi you can use datetime package :)

For example:

import datetime

import requests


def extract_data(start_date, end_date):
    while start_date <= end_date:
        yield requests.get('www.example.com/getpublicreport?date=%s' % start_date.isoformat())
        start_date  = datetime.timedelta(days=1)

if __name__ == '__main__':
    for r in extract_data(datetime.date(2019, 01, 01), datetime.date.today()):
        print(r.content)

  • Related