I'm not sure if I'm going about this the right way. However, I'm trying to create a multi-index off three fields, which eventually will be three separate joins based upon input data.
I'm creating a second data frame in the example below, which I want to use a date lookup table, which will be able to join to the expiration date.
When I try to create the multi-index I get the following error:
ValueError: Length of names must match number of levels in MultiIndex.
I've tried turn the dataframe data_index into a series, but I'm still getting the error.
import pandas as pd
import numpy as np
import requests
from datetime import date
raw_data = requests.get(f"https://cdn.cboe.com/api/global/delayed_quotes/options/_SPX.json")
dict_data = pd.DataFrame.from_dict(raw_data.json())
#create dataframe from options key
data = pd.DataFrame(dict_data.loc["options", "data"])
# regex to strip the contract type, strike, and expire date
data["contract_type"] = data.option.str.extract(r"\d([A-Z])\d")
data["strike_price"] = data.option.str.extract(r"\d[A-Z](\d )\d\d\d").astype(int)
data["expiration_date"] = str(20) data.option.str.extract(r"[A-Z](\d )").astype(str)
# Convert expiration to datetime format
data["expiration_date"] = pd.to_datetime(data["expiration_date"], format="%Y-%m-%d")
data_index = pd.MultiIndex.from_frame(data, names=[["expiration_date", "strike_price", "contract_type"]]) ## this is where the error occurs
print(data_index)
date_df = pd.DataFrame(data[["expiration_date"]].unique())
date_df.set_index('expiration_date').join(data_index.set_index('expiration_date'))
today_date = date.today().strftime("%Y-%m-%d")
days_till_expire = (np.busday_count(today_date,date_df["expiration_date"].dt.strftime("%Y-%m-%d").to_list())/ 252)
date_df.loc[date_df["days_till_expire"].eq(0), "days_till_expire"] = 1 / 252
print(date_df)
How can I get the multi index to work with a join on a single index? Joining the date_df into the multi_index dataframe? I believe the logic below should work if the multi-index is setup correctly
CodePudding user response:
It's probably too long to put it into comments, so here in form of an answer:
I suppose that what you wanted to achieve is this (not from the entire frame, but from the three columns out of it):
data_index = pd.MultiIndex.from_arrays([data["expiration_date"], data["strike_price"], data["contract_type"]], names=["expiration_date", "strike_price", "contract_type"]) ## this is where the error occurs
For how to join on a single index check out: How to join a multi-index series to a single index dataframe with Pandas?
Does it help you?