Expected 2D array, got 1D array instead: how do i solve it?-CodePudding

i've been trying to develop a linear regression model using my cleaned datasets. Here is my datasets: https://docs.google.com/spreadsheets/d/1G7URct9yPAxEETLrb_F1McN-bgKp_r8cykmCOmojhT0/edit?usp=sharing

i've processed the data with label encoder and just split it with train_test_split

cols = ['nama_pasar','komoditas']

for col in cols:
  df_test[col] = LE.fit_transform(df_test[col])
  print(LE.classes_)

X = df_test[['tanggal','nama_pasar','komoditas']]
y = df_test[['harga']]

from sklearn.model_selection import train_test_split
X_train, y_train, X_test, y_test = train_test_split(X, y, test_size = 0.5)

When i try to fit the data, no problems showed up.

LR.fit(X_train, y_train)

But when im trying to use prediction from my linear regression model but it keeps showing this errors

LR.predict([1, 10, 4])

ValueError: Expected 2D array, got 1D array instead:
array=[ 1 10  4].
Reshape your data either using array.reshape(-1, 1) if your data has a single feature or array.reshape(1, -1) if it contains a single sample.

I've tried to change the number of test size into 0.2 but it shows different error than the first one.

ValueError: Found input variables with inconsistent numbers of samples: [37936, 9484]

How do i solve this?

I'm still learning the very basics of data science, an explanation would be much appreciated

Thankyou

CodePudding user response：

Your x is a 2d array and has been trained like that hence when you want a prediction you need to pass an 2d array Try the below.

LR.predict([[1, 10, 4]])

CodePudding user response：

As for your second question (and your comment): You got the order for the train-test-split wrong.

X_train, y_train, X_test, y_test = train_test_split(X, y, test_size = 0.2)

It must be correctly:

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.2)

It is always good practice to try and check the dimensions of your arrays/dataframes for sanity-checks.