I am rather new to Web Scraping I have scrapped one of the zip files seen here. The goal is to append them into a final data frame called final_df. Below is a snip of my code that runs well.
zip_url = "https://www.omie.es/es/file-download?parents[0]=marginalpdbc&filename=marginalpdbc_2017.zip"
dfs = []
with ZipFile(BytesIO(requests.get(zip_url, verify=False).content)) as zf:
for file in zf.namelist():
df = pd.read_csv(
zf.open(file),
sep=";",
skiprows=1,
skipfooter=1,
engine="python",
header=None,
)
dfs.append(df)
final_df = pd.concat(dfs)
# print first 10 rows:
print(final_df.head(10).to_markdown(index=False))
This works well for one year of zip files such as 2017 however I am curious if we could get it all in one swoop. My thinking is to create a F string and change the year in each iteration.
date_list = ['2017','2018','2019','2020','2021']
dfs = []
for dates in date_list:
with ZipFile(BytesIO(requests.get(f'"https://www.omie.es/es/file-download?parents[0]=marginalpdbc&filename=marginalpdbc_{dates}.zip"', verify=False).content)) as zf:
for file in zf.namelist():
df = pd.read_csv(
zf.open(file),
sep=";",
skiprows=1,
skipfooter=1,
engine="python",
header=None,
)
dfs.append(df)
final_df = pd.concat(dfs)
# print first 10 rows:
print(final_df.head(10).to_markdown(index=False))
If we just isolate the f string we will see an output such as
"https://www.omie.es/es/file-download?parents[0]=marginalpdbc&filename=marginalpdbc_2020.zip" "https://www.omie.es/es/file-download?parents[0]=marginalpdbc&filename=marginalpdbc_2021.zip"
...etc.
Yet when I feed this using the above loop I get an error saying "InvalidSchema: No connection adapters were found for '"https://www.omie.es/es/file-download?parents[0]=marginalpdbc&filename=marginalpdbc_2017.zip"'"
What would be the best workaround?
CodePudding user response:
This error means that requests
module cannot identify what sort of protocol your requests needs (e.g. http, https, ftp etc.)
This happens in your case because you have a leading "
character in your url:
with ZipFile(BytesIO(requests.get(f'"https://www.omie.es/es/file-download?parents[0]=marginalpdbc&filename=marginalpdbc_{dates}.zip"', verify=False).content)) as zf:
# ^^^
Requests is looking for an adapter for "https
protocol which doesn't exist :)
Just delete the extra quotes.