I'm attempting to create a date index on a dataframe from a copy of another dataframe using the unique values. My problem is that the index wont' allow me to set the index name to expiration date, because it's not recognizing the key
import pandas as pd
import requests
raw_data = requests.get(f"https://cdn.cboe.com/api/global/delayed_quotes/options/SPY.json")
dict_data = pd.DataFrame.from_dict(raw_data.json())
spot_price = dict_data.loc["current_price", "data"]
#create dataframe from options key
data = pd.DataFrame(dict_data.loc["options", "data"])
data['expiration_date'] = str(20) data['option'].str.extract((r"[A-Z](\d )")).astype(str)
data["expiration_date"] = pd.to_datetime(data["expiration_date"], format="%Y-%m-%d")
# create date dataframe
date_df = pd.DataFrame(data["expiration_date"].unique())
date_df.index = pd.to_datetime(date_df.index)
date_df.set_index('expiration_date', inplace=True)
print(date_df.index)
print(date_df.index.name)
print(date_df)
This gives me the error: KeyError: "None of ['expiration_date'] are in the columns"
I'm able to get close if I use: date_df.index = pd.to_datetime(date_df.index)
however, I get a strange format for my key, it turns to '1970-01-01 00:00:00.000000000 2022-09-21'
I've tried adding , format="%Y-%m-%d"
, but it doesn't change the format.
If I use date_df.index = pd.to_datetime(date_df.index).strftime("%Y-%m-%d")
it does fix the date format, but I'm still left with 1970-01-01 and my index_name is still none.
Using date_df.index.names = ['expiration_date']
will let me change the index name to expiration-date, but my index is still 0 and it adds a column for the date 1970, which I dont' want.
0
expiration_date
1970-01-01 2022-09-21
Now if I try to set the index I'm still greeted with none of expiration_date are in the columns.
As you can see I'm all over the place, what is the correct way to assign an index for dataframe on a date field?
The commented code is where I'm stuck:
date_df = pd.DataFrame(data["expiration_date"].unique())
date_df.index.names = ['expiration_date']
date_df.index = pd.to_datetime(date_df.index).strftime("%Y-%m-%d")
# date_df.set_index('expiration_date', inplace=True)
print(date_df.index.name)
print(date_df)
CodePudding user response:
If you want to create a DataFrame, which is a copy of your first "data" DataFrame, with unique values of the 'expiration_date' column, and set its index as this column you can use this code:
# copy data DataFrame and set its index as expiration_date
date_df = data.set_index("expiration_date")
# drop duplicated index
date_df=date_df[~date_df.index.duplicated(keep='first')]
Issue with your existing code is related to this line date_df = pd.DataFrame(data["expiration_date"].unique())
. This line creates DataFrame indexed from 0 to length, and its first column called "0" that gets your unique values. If this is what you want you can change this line like:
date_df = pd.DataFrame(data["expiration_date"].unique(),columns=["expiration_date"])
date_df.set_index('expiration_date', inplace=True)