Home > OS >  Issue when reading csv file from url using pandas.read_csv
Issue when reading csv file from url using pandas.read_csv

Time:12-03

I am trying to import a csv file from the following url

"https://www.marketwatch.com/games/stackoverflowq/download?view=holdings&pub=4JwsLs_Gm4kj&isDownload=true"

using the pandas read_csv function. However, I get the following error:

StopIteration: 

The above exception was the direct cause of the following exception:
...
--> 386         raise EmptyDataError("No columns to parse from file") from err
    388     line = self.names[:]
    390 this_columns: list[Scalar | None] = []

EmptyDataError: No columns to parse from file

Downloading the csv manually and then reading it with pd.read_csv yields the expected output without issues. As I need to repeat this for multiple csvs, I would like to directly import the csvs without having to manually download them each time.

I have also tried this solution https://stackoverflow.com/questions/47243024/pandas-read-csv-on-dynamic-url-gives-emptydataerror-no-columns-to-parse-from-fi[](https://www.stackoverflow.com/), which also resulted in the 'No columns to parse from file' error.

I could only find a link from the html and the button on the website, without a .csv ending:

<a href="/games/stackoverflowq/download?view=holdings&amp;pub=4JwsLs_Gm4kj&amp;isDownload=true" download="Holdings - Stack Overflowq.csv" rel="nofollow">Download</a>

Edit: Cleaned up the question in case somebody has a similar issue.

CodePudding user response:

The issue was indeed that the data could only be accessed after logging in. I have managed to resolve it using Selenium and this answer.

from io import StringIO 
import pandas as pd
import requests
from selenium import webdriver

#start requests session with login from selenium driver
s = requests.Session()
selenium_user_agent = driver.execute_script("return navigator.userAgent;")
s.headers.update({"user-agent": selenium_user_agent})

#copy cookies from selenium driver
for cookie in driver.get_cookies():
    s.cookies.set(cookie['name'], cookie['value'], domain=cookie['domain'])

#read csv
response = s.get(url)
if response.ok:
    data = response.content.decode('utf8') 
    df = pd.read_csv(StringIO(data))
        
   
  • Related