Here's the dataset I'm using:
from sklearn.model_selection import train_test_split
from sklearn.datasets import load_boston
import warnings
with warnings.catch_warnings():
warnings.filterwarnings("ignore")
data = load_boston()
X = data.data
y = data.target
First, I converted the target dat into a classification dataset according to the prompt:
Step 1: Split the range of target values into three equal parts - low, mid, and high.
y = list(y)
for i in range(len(y)):
index = y.index(min(y))
if i < len(y)/3:
y[index] = 100
elif i > len(y)/3 and i < 2*(len(y)/3):
y[index] = 200
else:
y[index] = 300
Step 2: Reassign the target values into into three categorical values 0, 1, and 2, representing low, mid and high range of values, respectively.
def numerial(y):
if y == 100:
return 0
elif y == 200:
return 1
else:
return 2
y = map(numerial, y)
Step 3: Split the dataset into 70% training set and 30% test set.
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.3, random_state = 99)
So, when I run step 3 i kept getting a error saying:
TypeError: Singleton array array(<map object at 0x7fac066a57c0>, dtype=object) cannot be considered a valid collection.
I wonder which of the steps that I did wrong to keep getting the error message. Any idea?
CodePudding user response:
You can fix this problem with the following modification:
y = list(map(numerial, y))
But, you can simplify your code using cut
function from pandas
.
y = pd.cut(y, [0, len(y)/3, 2*len(y)/3, len(y)], labels=[0, 1, 2])
Or, even simpler:
pd.cut(y, bins=3, labels=False)