Home > database >  Linear regression loop in Python (with 3 variables)
Linear regression loop in Python (with 3 variables)

Time:07-23

I'm attempting to run a linear regression function within a loop with two independent variables and one dependent variable. I've created new objects consisting of 1,000 random numbers selected for each of the 74 data points. I'm able to run this first segment without any issues, but am having trouble when it comes to looping the linear regression function.

from sklearn import linear_model

x = glodap_hot_merged_finalized['G2salinity']
y = glodap_hot_merged_finalized['G2talk']
z = glodap_hot_merged_finalized['G2temperature']

iterations = 1000

stdevs = np.empty((iterations,), dtype=float)
slopes = np.empty((iterations,), dtype=float)
intercepts = np.empty((iterations,), dtype=float)

nbot = len(x)
sal = x.values
alk = y.values
temp = z.values
           
sal_ens = np.random.randn(iterations, nbot) * 1e-3   sal[np.newaxis, :]  
alk_ens = np.random.randn(iterations, nbot) * 2   alk
temp_ens = np.random.randn(iterations, nbot) * 1e-2   temp[np.newaxis, :]

# the shapes for sal_ens, alk_ens, and temp_ens are all (1000,74)

I've been trying to run the following loop in Python with the sal_ens, temp_ens, and alk_ens variables:

for i in range(iterations):

    X = sal_ens[i], temp_ens[i]
    Y = alk_ens[i]
 
    regr = linear_model.LinearRegression()
    regr.fit(X, Y)

    intercept_value = sm.add_constant(X) 
    
    intercept =  intercept_value[i]
    coef = regr.coef_[i]

I keep getting an error message that says:

ValueError: Found input variables with inconsistent numbers of samples: [2,74]

I'm trying to run 1000 regressions using the random numbers selected for each variable (sal_ens, temp_ens, and alk_ens) in order to generate output with 1000 different slopes & intercepts.

Any input or help with this would be greatly appreciated!

CodePudding user response:

To resolve your error, you just need to create your X array properly. Currently, your code makes X a tuple of two 1-dimensional arrays, each with shape (74,). If you look at the documentation for the LinearRegression().fit() method, you can see that X needs to be an array-like object of shape (n_samples, n_features).

So, you can replace that line with this:

X = np.hstack((sal_ens[i].reshape(-1,1), temp_ens[i].reshape(-1,1)))

The .reshape(-1,1) will convert each of the two arrays to a 2-dimensional array of shape (74,1) and then np.hstack(...) will stack them horizontally to give you your desired array of shape (74,2).

  • Related