How do I assign year&months in PD dataframe?-CodePudding

My pandaframe looks very weird after running the code. The data doesnt not come with a year/month variable so I have to add them manually. Is there a way I could do that?

sample = []

url1 = "https://api.census.gov/data/2018/cps/basic/jan?get=PEFNTVTY,PEMNTVTY&for=state:01&PEEDUCA=39&key=YOUR_KEY_GOES_HERE"
url2 = "https://api.census.gov/data/2018/cps/basic/feb?get=PEFNTVTY,PEMNTVTY&for=state:01&PEEDUCA=39&key=YOUR_KEY_GOES_HERE"
url3 = "https://api.census.gov/data/2018/cps/basic/mar?get=PEFNTVTY,PEMNTVTY&for=state:01&PEEDUCA=39&key=YOUR_KEY_GOES_HERE"

sample.append(requests.get(url1).text)
sample.append(requests.get(url2).text)
sample.append(requests.get(url3).text)

sample = [json.loads(i) for i in sample]
sample = pd.DataFrame(sample)
sample

This link is an image of what I get after running the code with my key.

CodePudding user response：

The response to each API call is a JSON array of arrays. You called the wrong DataFrame constructor. Try this:

base_url = "https://api.census.gov/data/2018/cps/basic"
params = {
    "get": "PEFNTVTY,PEMNTVTY",
    "for": "state:01",
    "PEEDUCA": 39,
}

df = []
for month in ["jan", "feb", "mar"]:
    r = requests.get(f"{base_url}/{month}", params=params)
    r.raise_for_status()
    
    j = r.json()
    df.append(pd.DataFrame.from_records(j[1:], columns=j[0]).assign(month=month))

df = pd.concat(df)

Result:

  PEFNTVTY PEMNTVTY PEEDUCA state month
0       57       57      39     1   jan
1       57       57      39     1   jan
2       57       57      39     1   jan
3       57       57      39     1   jan
4       57       57      39     1   jan
...

CodePudding user response：

Consider read_json to directly read the Census URL API inside a user-defined method. Then zip to iterate pairwise through years and months to build data frame and assign corresponding columns:

import pandas as pd
import calendar

def get_census_data(year, month):
    # BUILD DYNAMIC URL
    url = (
        f"https://api.census.gov/data/{year}/cps/basic/{month.lower()}?"
        "get=PEFNTVTY,PEMNTVTY&for=state:01"
    )

    # CLEAN RAW DATA FOR APPROPRIATE ROWS AND COLS, ASSIGN YEAR/MONTH COLS
    raw_df = pd.read_json(url)    
    cps_df = (
        pd.DataFrame(raw_df.iloc[1:,])
          .set_axis(raw_df.iloc[0,], axis="columns", inplace=False)
          .assign(year = year, month = month)
    )
    
    return cps_df

# MONTH AND YEAR LISTS
months = calendar.month_abbr[1:13]
years = [2018] * len(months)

# ITERATE PAIRWISE THROUGH LISTS
cps_list = [get_census_data(yr, mo) for yr, mo in zip(years, months)]

# COMPILE FINAL DATA FRAME
cps_df = pd.concat(cps_list, ignore_index=True)

Output

cps_df
0     PEFNTVTY PEMNTVTY state  year month
0           57       57     1  2018   Jan
1           57       57     1  2018   Jan
2           57       57     1  2018   Jan
3           57       57     1  2018   Jan
4           57       57     1  2018   Jan
       ...      ...   ...   ...   ...
29710       57       57     1  2018   Dec
29711       57       57     1  2018   Dec
29712       57       57     1  2018   Dec
29713       57       57     1  2018   Dec
29714       57       57     1  2018   Dec

[29715 rows x 5 columns]