Home > Enterprise >  reshaping a series into a 2D array
reshaping a series into a 2D array

Time:09-29

data comes from 'copper-new.txt':'https://storage.googleapis.com/aipi_datasets/copper-new.txt

I am looking at a dataset that shows how the thermal expansion coefficient of copper changes with temperature. I am trying to model the relationship of thermal expansion coefficent to temperature to be able to predict the coefficient for any new temperature value. We will use a Linear Regression model

# Need to split into columns since Pandas did not do it for us
copperdata['X'] = copperdata.apply(lambda x: x.str.split()[0][1],axis=1)
copperdata['y'] = copperdata.apply(lambda x: x.str.split()[0][0],axis=1)
copperdata = copperdata[['X','y']].astype(float)
copperdata.head()

Now I need to complete the below function lin_model() which takes as input the dataframe (copperdata) and then performs the following: CTo do that I need to create NumPy arrays for X (the input feature - temperature) and y (the target - thermal expansion coefficient). You will need to reshape the X array into a 2D array with the second dimension being 1 in order for it to work as an input to scikit-learns models. Split your data into training, test and validation sets. Use 10% of the total data for the test set. Of the remaining 90%, use 80% of it for training and 20% of it for the validation set. Be sure to set random_state=0 when splitting the data. Train a linear regression model and then compute the MAE on the validation set Your function should return 1) the trained model and 2) the validation MAE

def lin_model(df):
    # YOUR CODE HERE
    X = copperdata['X']
    np.asarray('X')
    y = copperdata['y']
    np.asarray('y')
    X.values.reshape(-1, 1)
    y.values.reshape(-1, 1)
    X_train_full,X_test,_y_train_full,y_test = train_test_split(X,y,random_state=0,test_size = .10)
    X_train,X_val,y_train,y_val = train_test_split(X_train_full,y_train_full,random_state = 0, test_size = .20)
    model = LinearRegression()
    model.fit(X_train,y_train)
    val_preds = model.predict(X_val)
    mae = metrics.mean_absolute_error(y_test,y_pred)
    return model, mae

    raise NotImplementedError()

When trying to pass this test cell:

model,mae = lin_model(copperdata)

This error:

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-112-8931ac554101> in <module>
      1 # Test cell
----> 2 model,mae = lin_model(copperdata)
      3 
      4 # Print model coefficients and intercept
      5 m = model.coef_

<ipython-input-111-ba2035c8e718> in lin_model(df)
      6     model.fit(X_train,y_train)
      7     #val_preds = model.predict(X_val)
----> 8     y_preds = model.predict(X_test)
      9     mae = metrics.mean_absolute_error(y_test,y_pred)
     10     return model, mae

/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/site-packages/sklearn/linear_model/_base.py in predict(self, X)
    234             Returns predicted values.
    235         """
--> 236         return self._decision_function(X)
    237 
    238     _preprocess_data = staticmethod(_preprocess_data)

/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/site-packages/sklearn/linear_model/_base.py in _decision_function(self, X)
    216         check_is_fitted(self)
    217 
--> 218         X = check_array(X, accept_sparse=['csr', 'csc', 'coo'])
    219         return safe_sparse_dot(X, self.coef_.T,
    220                                dense_output=True)   self.intercept_

/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/site-packages/sklearn/utils/validation.py in inner_f(*args, **kwargs)
     70                           FutureWarning)
     71         kwargs.update({k: arg for k, arg in zip(sig.parameters, args)})
---> 72         return f(**kwargs)
     73     return inner_f
     74 

/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/site-packages/sklearn/utils/validation.py in check_array(array, accept_sparse, accept_large_sparse, dtype, order, copy, force_all_finite, ensure_2d, allow_nd, ensure_min_samples, ensure_min_features, estimator)
    617             # If input is 1D raise error
    618             if array.ndim == 1:
--> 619                 raise ValueError(
    620                     "Expected 2D array, got 1D array instead:\narray={}.\n"
    621                     "Reshape your data either using array.reshape(-1, 1) if "

ValueError: Expected 2D array, got 1D array instead:
array=[656.2  544.47 524.7   60.41 447.41  89.57].
Reshape your data either using array.reshape(-1, 1) if your data has a single feature or array.reshape(1, -1) if it contains a single sample.

CodePudding user response:

You should just reshape the explanatory X variable and overwrite the variable X = ... otherwise the reshape won't be saved into X.

def lin_model(df):
    # YOUR CODE HERE
    X = copperdata['X']
    y = copperdata['y']

    X = X.values.reshape(-1, 1)

    X_train_full,X_test,_y_train_full,y_test = train_test_split(X,y,random_state=0,test_size = .10)

    model = LinearRegression()
    model.fit(X_train,y_train)
    val_preds = model.predict(X_val)

    mae = metrics.mean_absolute_error(y_test,y_pred)
    return model, mae
  • Related