data comes from 'copper-new.txt':'https://storage.googleapis.com/aipi_datasets/copper-new.txt
I am looking at a dataset that shows how the thermal expansion coefficient of copper changes with temperature. I am trying to model the relationship of thermal expansion coefficent to temperature to be able to predict the coefficient for any new temperature value. We will use a Linear Regression model
# Need to split into columns since Pandas did not do it for us
copperdata['X'] = copperdata.apply(lambda x: x.str.split()[0][1],axis=1)
copperdata['y'] = copperdata.apply(lambda x: x.str.split()[0][0],axis=1)
copperdata = copperdata[['X','y']].astype(float)
copperdata.head()
Now I need to complete the below function lin_model() which takes as input the dataframe (copperdata) and then performs the following: CTo do that I need to create NumPy arrays for X (the input feature - temperature) and y (the target - thermal expansion coefficient). You will need to reshape the X array into a 2D array with the second dimension being 1 in order for it to work as an input to scikit-learns models. Split your data into training, test and validation sets. Use 10% of the total data for the test set. Of the remaining 90%, use 80% of it for training and 20% of it for the validation set. Be sure to set random_state=0 when splitting the data. Train a linear regression model and then compute the MAE on the validation set Your function should return 1) the trained model and 2) the validation MAE
def lin_model(df):
# YOUR CODE HERE
X = copperdata['X']
np.asarray('X')
y = copperdata['y']
np.asarray('y')
X.values.reshape(-1, 1)
y.values.reshape(-1, 1)
X_train_full,X_test,_y_train_full,y_test = train_test_split(X,y,random_state=0,test_size = .10)
X_train,X_val,y_train,y_val = train_test_split(X_train_full,y_train_full,random_state = 0, test_size = .20)
model = LinearRegression()
model.fit(X_train,y_train)
val_preds = model.predict(X_val)
mae = metrics.mean_absolute_error(y_test,y_pred)
return model, mae
raise NotImplementedError()
When trying to pass this test cell:
model,mae = lin_model(copperdata)
This error:
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-112-8931ac554101> in <module>
1 # Test cell
----> 2 model,mae = lin_model(copperdata)
3
4 # Print model coefficients and intercept
5 m = model.coef_
<ipython-input-111-ba2035c8e718> in lin_model(df)
6 model.fit(X_train,y_train)
7 #val_preds = model.predict(X_val)
----> 8 y_preds = model.predict(X_test)
9 mae = metrics.mean_absolute_error(y_test,y_pred)
10 return model, mae
/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/site-packages/sklearn/linear_model/_base.py in predict(self, X)
234 Returns predicted values.
235 """
--> 236 return self._decision_function(X)
237
238 _preprocess_data = staticmethod(_preprocess_data)
/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/site-packages/sklearn/linear_model/_base.py in _decision_function(self, X)
216 check_is_fitted(self)
217
--> 218 X = check_array(X, accept_sparse=['csr', 'csc', 'coo'])
219 return safe_sparse_dot(X, self.coef_.T,
220 dense_output=True) self.intercept_
/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/site-packages/sklearn/utils/validation.py in inner_f(*args, **kwargs)
70 FutureWarning)
71 kwargs.update({k: arg for k, arg in zip(sig.parameters, args)})
---> 72 return f(**kwargs)
73 return inner_f
74
/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/site-packages/sklearn/utils/validation.py in check_array(array, accept_sparse, accept_large_sparse, dtype, order, copy, force_all_finite, ensure_2d, allow_nd, ensure_min_samples, ensure_min_features, estimator)
617 # If input is 1D raise error
618 if array.ndim == 1:
--> 619 raise ValueError(
620 "Expected 2D array, got 1D array instead:\narray={}.\n"
621 "Reshape your data either using array.reshape(-1, 1) if "
ValueError: Expected 2D array, got 1D array instead:
array=[656.2 544.47 524.7 60.41 447.41 89.57].
Reshape your data either using array.reshape(-1, 1) if your data has a single feature or array.reshape(1, -1) if it contains a single sample.
CodePudding user response:
You should just reshape the explanatory X variable and overwrite the variable X = ...
otherwise the reshape
won't be saved into X.
def lin_model(df):
# YOUR CODE HERE
X = copperdata['X']
y = copperdata['y']
X = X.values.reshape(-1, 1)
X_train_full,X_test,_y_train_full,y_test = train_test_split(X,y,random_state=0,test_size = .10)
model = LinearRegression()
model.fit(X_train,y_train)
val_preds = model.predict(X_val)
mae = metrics.mean_absolute_error(y_test,y_pred)
return model, mae