I am trying to import a csv file from the following url
"https://www.marketwatch.com/games/stackoverflowq/download?view=holdings&pub=4JwsLs_Gm4kj&isDownload=true"
using the pandas read_csv function. However, I get the following error:
StopIteration:
The above exception was the direct cause of the following exception:
...
--> 386 raise EmptyDataError("No columns to parse from file") from err
388 line = self.names[:]
390 this_columns: list[Scalar | None] = []
EmptyDataError: No columns to parse from file
Downloading the csv manually and then reading it with pd.read_csv yields the expected output without issues. As I need to repeat this for multiple csvs, I would like to directly import the csvs without having to manually download them each time.
I have also tried this solution https://stackoverflow.com/questions/47243024/pandas-read-csv-on-dynamic-url-gives-emptydataerror-no-columns-to-parse-from-fi[](https://www.stackoverflow.com/), which also resulted in the 'No columns to parse from file' error.
I could only find a link from the html and the button on the website, without a .csv ending:
<a href="/games/stackoverflowq/download?view=holdings&pub=4JwsLs_Gm4kj&isDownload=true" download="Holdings - Stack Overflowq.csv" rel="nofollow">Download</a>
Edit: Cleaned up the question in case somebody has a similar issue.
CodePudding user response:
The issue was indeed that the data could only be accessed after logging in. I have managed to resolve it using Selenium and this answer.
from io import StringIO
import pandas as pd
import requests
from selenium import webdriver
#start requests session with login from selenium driver
s = requests.Session()
selenium_user_agent = driver.execute_script("return navigator.userAgent;")
s.headers.update({"user-agent": selenium_user_agent})
#copy cookies from selenium driver
for cookie in driver.get_cookies():
s.cookies.set(cookie['name'], cookie['value'], domain=cookie['domain'])
#read csv
response = s.get(url)
if response.ok:
data = response.content.decode('utf8')
df = pd.read_csv(StringIO(data))