Home > Back-end >  InvalidSchema: No connection adapters were found. When working with Python Web scraper
InvalidSchema: No connection adapters were found. When working with Python Web scraper

Time:08-31

I am rather new to Web Scraping I have scrapped one of the zip files seen here. The goal is to append them into a final data frame called final_df. Below is a snip of my code that runs well.

zip_url = "https://www.omie.es/es/file-download?parents[0]=marginalpdbc&filename=marginalpdbc_2017.zip"

dfs = []

with ZipFile(BytesIO(requests.get(zip_url, verify=False).content)) as zf:
    for file in zf.namelist():
        df = pd.read_csv(
            zf.open(file),
            sep=";",
            skiprows=1,
            skipfooter=1,
            engine="python",
            header=None,
        )
        dfs.append(df)
               

final_df = pd.concat(dfs)

# print first 10 rows:
print(final_df.head(10).to_markdown(index=False))

This works well for one year of zip files such as 2017 however I am curious if we could get it all in one swoop. My thinking is to create a F string and change the year in each iteration.

date_list = ['2017','2018','2019','2020','2021']
dfs = []
for dates in date_list:
    with ZipFile(BytesIO(requests.get(f'"https://www.omie.es/es/file-download?parents[0]=marginalpdbc&filename=marginalpdbc_{dates}.zip"', verify=False).content)) as zf:
        for file in zf.namelist():
            df = pd.read_csv(
                zf.open(file),
                sep=";",
                skiprows=1,
                skipfooter=1,
                engine="python",
                header=None,
                )
            dfs.append(df)



final_df = pd.concat(dfs)

# print first 10 rows:
print(final_df.head(10).to_markdown(index=False))

If we just isolate the f string we will see an output such as

"https://www.omie.es/es/file-download?parents[0]=marginalpdbc&filename=marginalpdbc_2020.zip" "https://www.omie.es/es/file-download?parents[0]=marginalpdbc&filename=marginalpdbc_2021.zip"

...etc.

Yet when I feed this using the above loop I get an error saying "InvalidSchema: No connection adapters were found for '"https://www.omie.es/es/file-download?parents[0]=marginalpdbc&filename=marginalpdbc_2017.zip"'"

What would be the best workaround?

CodePudding user response:

This error means that requests module cannot identify what sort of protocol your requests needs (e.g. http, https, ftp etc.)

This happens in your case because you have a leading " character in your url:

with ZipFile(BytesIO(requests.get(f'"https://www.omie.es/es/file-download?parents[0]=marginalpdbc&filename=marginalpdbc_{dates}.zip"', verify=False).content)) as zf:
#                                 ^^^

Requests is looking for an adapter for "https protocol which doesn't exist :)

Just delete the extra quotes.

  • Related