How to parse CSV into pandas dataframe-CodePudding

I am having a couple issues with setting up a way to automate the download of a csv. The two issues are when downloading using a simple pandas read_csv(url) method I get and SSL error, so I switched to using requests and trying to parse the response. The next issues is that I am getting some errors in parsing the response. I'm not sure if the reason is that the URL is actually returning a zip file and if that is how can I get around that.

Here is the URL: https://www.californiadgstats.ca.gov/download/interconnection_rule21_applications/ and here is the code:

import pandas as pd
import numpy as np
import os
import io
import requests 
import urllib3
requests.packages.urllib3.util.ssl_.DEFAULT_CIPHERS = 'ALL:@SECLEVEL=1'


url = "https://www.californiadgstats.ca.gov/download/interconnection_rule21_applications/"
res = requests.get(url).content
data = pd.read_csv(io.StringIO(res.decode('utf-8')))

CodePudding user response：

If the content is zip format, you should unzip it, and use its contents (csv, txt...).

I wasn't able to download the file due to the low speed from host

CodePudding user response：

Here is the answer I found although I don't really need to actually save these files locally, so if anyone knows how to parse zipfiles without downloading that would be great. Also not sure why I get that SSL error with pandas, but not with requests...

import requests
import zipfile
from io import BytesIO


url = "https://www.californiadgstats.ca.gov/download/interconnection_rule21_applications/"



pathSave = "C:/Users/wherever"
filename = url.split('/')[-1]

r = requests.get(url) 

zipfile= zipfile.ZipFile(BytesIO(r.content))
zipfile.extractall(pathSave)