Home > Mobile >  How to download CSV from url starting with blob:https
How to download CSV from url starting with blob:https

Time:07-10

I'm trying to set up a script that will automatically update with current data on the 2022 Monkeypox outbreak from the US CDC. The data is located at a link on this page: https://www.cdc.gov/poxvirus/monkeypox/response/2022/us-map.html. The actual data link changes each time the page is loaded, but looks something like blob:https://www.cdc.gov/c0162bc4-2fb0-4ece-8903-28e2544c9258 (with the final part being a different random string of hex values and dashes on each load, and the rest being the same). I have tried various methods to download this, for instance.

import requests

url = 'blob:https://www.cdc.gov/c0162bc4-2fb0-4ece-8903-28e2544c9258'
res = requests.get(url)

This raises a requests.exception.InvalidSchema that says no connection adapters were found.

I also tried the solution given on this page using Selenium and the Chrome webdriver, but the exception is raised and says that the request failed with status 0.

Most other pages about downloading from blob:https sources have to do with videos, and rely on finding an m3u8 playlist to locate the actual URL for the download, but that strategy doesn't work in this case.

The other relevant link is here, but this is all in JS, and I'm not sure how to apply it to my case using Python. Is there a way to automate this so I don't have to manually go to the page and click the link every time I want to update the data?

CodePudding user response:

You can circumvent the download button and directly load the data like so:

import pandas as pd
import requests
response = requests.get('https://www.cdc.gov/poxvirus/monkeypox/response/modules/MX-response-case-count-US.json')

data = json.loads(response.content)

df = pd.DataFrame(data["data"])
df.head()

Print out:

    State of Residence  Cases   Range   
0   Arizona             3       1 to 2  
1   Arkansas            1       1 to 2  
2   California        136       1 to 2  
3   Colorado            9       1 to 2  
4   Connecticut         3       1 to 2  
  • Related