When I don't enclose it in a function, the code works, but when I write it as a function, I get an undefined error with x and y values. Could you have an idea ? i've tried many things but I couldn't fix it.
import pandas as pd
import numpy as np
from math import sqrt
import matplotlib.pyplot as plt
from sklearn.preprocessing import MinMaxScaler
from sklearn.preprocessing import LabelEncoder
A=pd.read_excel('example.xlsx')
X=A.iloc[:,0:6]
Y=A.iloc[:,6:7]
X1=A.iloc[:,0:1]
X2=A.iloc[:,1:2]
X3=A.iloc[:,3:6]
def one_hot_encoding():
data=['apple','banana','orange']
label = LabelEncoder()
int_data = label.fit_transform(data)
int_data = int_data.reshape(len(int_data), 1)
onehot_data = OneHotEncoder(sparse=False)
onehot_data = onehot_data.fit_transform(int_data)
print("Categorical data encoded into integer values....\n")
print(onehot_data)
one_hot_encoding()
def normalize_data(x,y):
scaler = MinMaxScaler()
x=pd.DataFrame(scaler.fit_transform(X),columns=X.columns, index=X.index)
y=pd.DataFrame(scaler.fit_transform(Y),columns=Y.columns, index=Y.index)
return x,y
normalize_data(x,y)
def split_data():
normalize_data()
x_train, x_test, y_train, y_test = train_test_split(x,y,train_size=0.85)
retunr x_train.shape, x_test.shape, y_train.shape, y_test.shape
split_data()
CodePudding user response:
Would’ve helped to see more deets about the error. There is a syntax error somewhere. Make sure you have your references right. E.g. lower case x is different from your upper case X. If you’re calling the normalise_data function make sure you give it the arguments as per the definition. And if you’re taking x, y as arguments to a function it might throw an error when you try to redefine x and y inside the function. Also keep scope in mind. That may also throw an undefined error.
CodePudding user response:
1.) You don't need to Label Encode the variable before One hot encoding. You can directly one hot encode them.
2.) The reason you are getting the error x is not defined
is because you are returning x and y from the second function and directly using them in your third function. You have to store them in a variable first and then you can use them.
3.) There is a typo in the third function. It should be return
instead of retunr
!
Below I have corrected the mistakes and it should work now! Hope it helps!
def one_hot_encoding():
data=['apple','banana','orange']
onehot_data = OneHotEncoder(sparse=False)
onehot_data = onehot_data.fit_transform(data)
print("Categorical data encoded into integer values....\n")
print(onehot_data)
one_hot_encoding()
def normalize_data(x,y):
scaler = MinMaxScaler()
x=pd.DataFrame(scaler.fit_transform(X),columns=X.columns, index=X.index)
y=pd.DataFrame(scaler.fit_transform(Y),columns=Y.columns, index=Y.index)
return x,y
x1, y1 = normalize_data(x,y)
def split_data():
normalize_data()
x_train, x_test, y_train, y_test = train_test_split(x1, y1, train_size=0.85)
print(x_train.shape, x_test.shape, y_train.shape, y_test.shape)
split_data()
Cheers!