I'm trying to write a python script that will download data from https://data.cms.gov/provider-data/dataset/g6vv-u9sr and perform different operations on the dataset. I'm having trouble automating the extraction of this data, and am not sure how to properly write a query that will return me the entire dataset (preferably in the form of a csv for pandas). Any pointers?
CodePudding user response:
You can use requests
module to download the CSV data, for example:
import pandas as pd
from io import StringIO
r = requests.get(
"https://data.cms.gov/provider-data/sites/default/files/resources/72ed1971c684c81da254c00145da1b47_1647887934/NH_Penalties_Mar2022.csv"
)
df = pd.read_csv(StringIO(r.text))
print(df.dtypes)
print(len(df))
Prints:
Federal Provider Number object
Provider Name object
Provider Address object
Provider City object
Provider State object
Provider Zip Code int64
Penalty Date object
Penalty Type object
Fine Amount float64
Payment Denial Start Date object
Payment Denial Length in Days float64
Location object
Processing Date object
dtype: object
27881
EDIT: As @Parfait stated, you can use the url in pd.read_csv
directly. However, in this case is necessary to explicitly set enoding=
parameter ("latin1"/"iso_8859-1" works):
df = pd.read_csv(
"https://data.cms.gov/provider-data/sites/default/files/resources/72ed1971c684c81da254c00145da1b47_1647887934/NH_Penalties_Mar2022.csv",
encoding="iso_8859-1",
)
print(len(df))
Prints:
27881