I'm trying to preprocess the data frame for the decision tree creation but getting the error about the dimension:
y should be a 1d array, got an array of shape (1, 1460) instead.
I've tried to use values = df_train[col].unique().flatten()
but the error was the same.
The code is the following:
for col in df_train.columns:
values = df_train[col].unique()
new_col = preprocessing.LabelEncoder()
new_col.fit([values])
col_num = df_train.columns.get_loc(col)
df_train[:,col_num] = new_col.transform(df_train[:,col_num])
Example of columns:
Colour | Area |
---|---|
Red | 230 |
Yellow | 400 |
Thank you!
CodePudding user response:
Here's your code fixed (pass 'values' without an outside list, also use .iloc for integer indexing):
import pandas as pd
from sklearn import preprocessing
df_train = pd.DataFrame({'A': ['cat', 'dog', 'cat', 'cactus'],
'B': ['gray', 'black', 'black', 'green']})
print(df_train)
A B
0 cat gray
1 dog black
2 cat black
3 cactus green
for col in df_train.columns:
values = df_train[col].unique()
new_col = preprocessing.LabelEncoder()
new_col.fit(values)
col_num = df_train.columns.get_loc(col)
df_train.iloc[:,col_num] = new_col.transform(df_train.iloc[:,col_num])
print(df_train)
A B
0 1 2
1 2 0
2 1 0
3 0 1
But this is way too complicated. It's better to use OrdinalEncoder:
Proper way
ord_enc = preprocessing.OrdinalEncoder()
X_train = ord_enc.fit_transform(df_train)
print(X_train)
[[1. 2.]
[2. 0.]
[1. 0.]
[0. 1.]]
OrdinalEncoder is designed for features transformation, unlike LabelEncoder, which is for target transformation.
CodePudding user response:
I can't say for sure because you have not shared the full output of the code, but I think, if you take transpose of the result of df_train[col].unique()
which will convert it from [1,1046)
to (1046, 1)
. It is my guess that 1046
should be your number of samples and 1
should be number of columns