Home > other >  How to split a dataframe with multiple curve data points
How to split a dataframe with multiple curve data points

Time:07-06

I have a corporate bond dataframe that has multiple types of bonds with two columns on their yields and years-to-maturity values. When I plot their yields against the years to maturity, I can clearly see at least three, possibly four yield curves. I would like to fit at least three curves on this data and then divide the dataframe into smaller chunks based on distance to the curve. Here's a simple scatter plot:

enter image description here

I have tried the Ransac method described here: Ransac plot

and here's what I tried using RANSAC:

y_ax = df_clean.YTW
x_ax = df_clean.YTM

class PolynomialRegression(object):
    def __init__(self, degree=3, coeffs=None):
        self.degree = degree
        self.coeffs = coeffs

    def fit(self, X, y):
        self.coeffs = np.polyfit(X.ravel(), y, self.degree)

    def get_params(self, deep=False):
        return {'coeffs': self.coeffs}

    def set_params(self, coeffs=None, random_state=None):
        self.coeffs = coeffs

    def predict(self, X):
        poly_eqn = np.poly1d(self.coeffs)
        y_hat = poly_eqn(X.ravel())
        return y_hat

    def score(self, X, y):
        return mean_squared_error(y, self.predict(X))

poly_degree = 3
ransac = RANSACRegressor(PolynomialRegression(degree=poly_degree),
                         residual_threshold=2 * np.std(y_ax),
                         random_state=0)
ransac.fit(np.expand_dims(x_ax, axis=1), y_ax)
inlier_mask = ransac.inlier_mask_

y_hat = ransac.predict(np.expand_dims(x_vals, axis=1))
plt.plot(x_vals, y_vals, 'bx', label='input samples')
plt.plot(x_vals[inlier_mask], y_vals[inlier_mask], 'go', label='inliers (2*STD)')
plt.plot(x_vals, y_hat, 'r-', label='estimated curve')

Shortly:

  1. Is there a way such that I can fit 3-4 separate curves on these plot points?
  2. How can I split the dataframe based on these curves?

The entire data is here (Only YTW & YTM are plotted here): First groupby

Notice how each curve appears to have its own color, suggesting this separation is related with the exponentials you described. If you move the fig, ax = plt.subplots() line inside the loop, you'll see each group separately. I did that and I saw tons of groups with very few points. I decided on a simple heuristic to separate them: length greater than 5.

Now it's time to fit. I used this exponential, which I got from Final plot

Here's the code that generated it:

groups = df.groupby('issuer')
fig, ax = plt.subplots()
param_names = ['ym', 'y0', 'k']
for group in groups:
    subdf = group[1].sort_values('YTM')
    if len(subdf) > 5:
        try:
            popt, pcov = curve_fit(exp_plateau, subdf['YTM'], subdf['YTW'])
        except RuntimeError:
            print(f"{group[0]} couldn't be fit. Skipping")
            continue
        
        l = ax.plot(subdf['YTM'], subdf['YTW'])
        ax.plot(subdf['YTM'], exp_plateau(subdf['YTM'], *popt), color=l[0].get_color(), ls='--')
        print(f'"{group[0]}"', *[f'{param_name}: {i:.2f} /-{j:.2f}' for param_name, i, j in zip(param_names, popt, np.sqrt(np.diag(pcov)))])
        #ax.set_title(group[0])
ax.set_xlabel('YTM')
ax.set_ylabel('YTW')

And the output with the fit parameters and their estimated errors:

"407 INTER INC CPN STRIP" ym: 5.66 /-0.09 y0: 3.80 /-0.05 k: 0.29 /-0.03
"AGT LTD COUPON STRIP" ym: 6.42 /-1.66 y0: 3.09 /-0.13 k: 0.23 /-0.17
"BANK OF NOVA SCOTIA" ym: 5.47 /-0.24 y0: 3.36 /-0.08 k: 0.43 /-0.10
"BCE COUPON STRIP" ym: 8.04 /-0.47 y0: 3.95 /-0.03 k: 0.05 /-0.01
"BRCOL GENERIC STRIP" ym: 4.30 /-0.01 y0: -0.12 /-0.68 k: 0.29 /-0.02
"CANADIAN IMP BK COMM HK" ym: 3.37 /-0.17 y0: 2.27 /-0.53 k: 5.88 /-5.41
"CANADIAN TIRE CPN STRIP" ym: 7.30 /-0.42 y0: 3.81 /-0.04 k: 0.08 /-0.02
"GREAT-WEST LIFECO CPN ST" ym: 55.02 /-918.22 y0: 3.79 /-0.18 k: 0.00 /-0.06
"GREATER TORONTO CPN STRP" ym: 6.31 /-0.44 y0: 3.56 /-0.04 k: 0.08 /-0.02
"HYDRO ONE STRIP" ym: 5.48 /-0.16 y0: 3.10 /-0.08 k: 0.19 /-0.03
"LEVIS QUE COUPON STRIP" ym: 3.88 /-0.08 y0: 2.81 /-0.03 k: 0.32 /-0.05
LOBLAW COS CPN STRIP couldn't be fit. Skipping
"NEW BRUN GENERIC CPN STP" ym: 4.32 /-0.01 y0: 2.59 /-0.07 k: 0.27 /-0.02
"SAGUENAY CPN STRIP" ym: 3.89 /-0.06 y0: 2.78 /-0.04 k: 0.33 /-0.05
"SUN LIFE FIN SPN STRIP" ym: 6.47 /-0.20 y0: 3.98 /-0.06 k: 0.11 /-0.02
"TELUS CORP COUPON STRIP" ym: 5.99 /-0.04 y0: 3.29 /-0.07 k: 0.22 /-0.01
TORONTO DOMINION STRIP couldn't be fit. Skipping
"TRANS-CANADA CPN STRIP" ym: 6.53 /-0.39 y0: 3.85 /-0.06 k: 0.13 /-0.03
"TRANSALTA CORP CPN STRIP" ym: 8.91 /-1.91 y0: 4.95 /-0.07 k: 0.07 /-0.05
"WINNIPEG COUPON STRIP" ym: 4.70 /-0.03 y0: -1142.59 /-259776205.48 k: 1.06 /-33065.14
  • Related