Home > Software engineering >  Getting date modified of the files - webscraping with beautifulsoup in python
Getting date modified of the files - webscraping with beautifulsoup in python

Time:03-23

I am trying to download all csv files from the following website: enter image description here

CodePudding user response:

You can apply list comprehension technique

from bs4 import BeautifulSoup
import requests
import pandas as pd
url = 'https://emi.ea.govt.nz/Wholesale/Datasets/FinalPricing/EnergyPrices'
r = requests.get(url)
print(r)
soup = BeautifulSoup(r.text, 'html.parser')

links=[]
date=[]
csv_links = ['https://emi.ea.govt.nz' a['href'] for a in soup.select('td[] a')]
modified_date=[ date.text for date in soup.select('td[] a')[1:]]
links.extend(csv_links)
date.extend(modified_date)

df = pd.DataFrame(data=list(zip(links,date)),columns=['csv_links','modified_date'])
print(df)

Output:

                                      csv_links         modified_date
0    https://emi.ea.govt.nz/Wholesale/Datasets/Fina...   22 Mar 2022
1    https://emi.ea.govt.nz/Wholesale/Datasets/Fina...   22 Mar 2022
2    https://emi.ea.govt.nz/Wholesale/Datasets/Fina...   22 Mar 2022
3    https://emi.ea.govt.nz/Wholesale/Datasets/Fina...   22 Mar 2022
4    https://emi.ea.govt.nz/Wholesale/Datasets/Fina...   22 Mar 2022
..                                                 ...           ...
107  https://emi.ea.govt.nz/Wholesale/Datasets/Fina...   20 Dec 2021
108  https://emi.ea.govt.nz/Wholesale/Datasets/Fina...   20 Dec 2021
109  https://emi.ea.govt.nz/Wholesale/Datasets/Fina...   20 Dec 2021
110  https://emi.ea.govt.nz/Wholesale/Datasets/Fina...   20 Dec 2021
111  https://emi.ea.govt.nz/Wholesale/Datasets/Fina...   20 Dec 2021

[112 rows x 2 columns]
  • Related