i've been trying to develop a linear regression model using my cleaned datasets. Here is my datasets: https://docs.google.com/spreadsheets/d/1G7URct9yPAxEETLrb_F1McN-bgKp_r8cykmCOmojhT0/edit?usp=sharing
i've processed the data with label encoder and just split it with train_test_split
cols = ['nama_pasar','komoditas']
for col in cols:
df_test[col] = LE.fit_transform(df_test[col])
print(LE.classes_)
X = df_test[['tanggal','nama_pasar','komoditas']]
y = df_test[['harga']]
from sklearn.model_selection import train_test_split
X_train, y_train, X_test, y_test = train_test_split(X, y, test_size = 0.5)
When i try to fit the data, no problems showed up.
LR.fit(X_train, y_train)
But when im trying to use prediction from my linear regression model but it keeps showing this errors
LR.predict([1, 10, 4])
ValueError: Expected 2D array, got 1D array instead:
array=[ 1 10 4].
Reshape your data either using array.reshape(-1, 1) if your data has a single feature or array.reshape(1, -1) if it contains a single sample.
I've tried to change the number of test size into 0.2 but it shows different error than the first one.
ValueError: Found input variables with inconsistent numbers of samples: [37936, 9484]
How do i solve this?
I'm still learning the very basics of data science, an explanation would be much appreciated
Thankyou
CodePudding user response:
Your x
is a 2d array
and has been trained like that hence when you want a prediction you need to pass an 2d array
Try the below.
LR.predict([[1, 10, 4]])
CodePudding user response:
As for your second question (and your comment): You got the order for the train-test-split wrong.
X_train, y_train, X_test, y_test = train_test_split(X, y, test_size = 0.2)
It must be correctly:
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.2)
It is always good practice to try and check the dimensions of your arrays/dataframes for sanity-checks.