Home > Blockchain >  How to create 1 linear regression for two groups of data
How to create 1 linear regression for two groups of data

Time:03-15

I have two scatterplots that I've placed on one plot. I want to find the linear regression line for the points of y1 and y2 combined (as in the regression between x and (y1 and y2) ), but I'm having difficulty since I usually only find the regression line for y1 or y2 separately. I also want to find the r^2 value (for the combined y1 and y2). I would appreciate any help I can get!

df1 = pd.DataFrame(np.random.randint(0,100,size=(15, 2)), columns=list('AB'))

y1 = df1['A']
y2 = df1['B']

plt.scatter(df1.index, y1)
plt.scatter(df1.index, y2)
plt.show()

CodePudding user response:

Sounds like you want to 'stack' columns A and B together; many ways to do it, here is one using stack:

df2 = df1.stack().rename('A_and_B').reset_index(level = 1, drop = True).to_frame()

Then df.head() looks like this:


    A_and_B
0   35
0   58
1   49
1   73
2   44

and the scatter plot:

plt.scatter(df2.index, df2['A_and_B'])

looks like enter image description here

I don't know how you do regressions, you can apply your method to df2 now. For example:

import statsmodels.api as sm
res = sm.OLS(df2['A_and_B'], df2.index).fit()
res.summary()

output:

                                 OLS Regression Results                                
=======================================================================================
Dep. Variable:                A_and_B   R-squared (uncentered):                   0.517
Model:                            OLS   Adj. R-squared (uncentered):              0.501
Method:                 Least Squares   F-statistic:                              31.10
Date:                Mon, 14 Mar 2022   Prob (F-statistic):                    5.11e-06
Time:                        23:02:47   Log-Likelihood:                         -152.15
No. Observations:                  30   AIC:                                      306.3
Df Residuals:                      29   BIC:                                      307.7
Df Model:                           1                                                  
Covariance Type:            nonrobust                                                  
==============================================================================
                 coef    std err          t      P>|t|      [0.025      0.975]
------------------------------------------------------------------------------
x1             4.8576      0.871      5.577      0.000       3.076       6.639
==============================================================================
Omnibus:                        3.466   Durbin-Watson:                   1.244
Prob(Omnibus):                  0.177   Jarque-Bera (JB):                1.962
Skew:                          -0.371   Prob(JB):                        0.375
Kurtosis:                       1.990   Cond. No.                         1.00
==============================================================================

Notes:
[1] R² is computed without centering (uncentered) since the model does not contain a constant.
[2] Standard Errors assume that the covariance matrix of the errors is correctly specified.

  • Related