Home > Back-end >  Writing a query for an entire table
Writing a query for an entire table

Time:04-22

I'm trying to write a python script that will download data from https://data.cms.gov/provider-data/dataset/g6vv-u9sr and perform different operations on the dataset. I'm having trouble automating the extraction of this data, and am not sure how to properly write a query that will return me the entire dataset (preferably in the form of a csv for pandas). Any pointers?

CodePudding user response:

You can use requests module to download the CSV data, for example:

import pandas as pd
from io import StringIO

r = requests.get(
    "https://data.cms.gov/provider-data/sites/default/files/resources/72ed1971c684c81da254c00145da1b47_1647887934/NH_Penalties_Mar2022.csv"
)

df = pd.read_csv(StringIO(r.text))
print(df.dtypes)
print(len(df))

Prints:

Federal Provider Number           object
Provider Name                     object
Provider Address                  object
Provider City                     object
Provider State                    object
Provider Zip Code                  int64
Penalty Date                      object
Penalty Type                      object
Fine Amount                      float64
Payment Denial Start Date         object
Payment Denial Length in Days    float64
Location                          object
Processing Date                   object
dtype: object

27881

EDIT: As @Parfait stated, you can use the url in pd.read_csv directly. However, in this case is necessary to explicitly set enoding= parameter ("latin1"/"iso_8859-1" works):

df = pd.read_csv(
    "https://data.cms.gov/provider-data/sites/default/files/resources/72ed1971c684c81da254c00145da1b47_1647887934/NH_Penalties_Mar2022.csv",
    encoding="iso_8859-1",
)
print(len(df))

Prints:

27881
  • Related