Home > Net >  Train and test dataset
Train and test dataset

Time:03-20

Here's the dataset I'm using:

from sklearn.model_selection import train_test_split
from sklearn.datasets import load_boston
import warnings
    
with warnings.catch_warnings():
warnings.filterwarnings("ignore")
data = load_boston()
X = data.data
y = data.target

First, I converted the target dat into a classification dataset according to the prompt:

Step 1: Split the range of target values into three equal parts - low, mid, and high.

y = list(y)
for i in range(len(y)):
    index = y.index(min(y))
    if i < len(y)/3:
        y[index] = 100
    elif i > len(y)/3 and i < 2*(len(y)/3):
        y[index] = 200
    else:
        y[index] = 300

Step 2: Reassign the target values into into three categorical values 0, 1, and 2, representing low, mid and high range of values, respectively.

def numerial(y):
    if y == 100:
        return 0
    elif y == 200:
        return 1
    else:
        return 2

y = map(numerial, y)

Step 3: Split the dataset into 70% training set and 30% test set.

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.3, random_state = 99)

So, when I run step 3 i kept getting a error saying:

TypeError: Singleton array array(<map object at 0x7fac066a57c0>, dtype=object) cannot be considered a valid collection.

I wonder which of the steps that I did wrong to keep getting the error message. Any idea?

CodePudding user response:

You can fix this problem with the following modification:

y = list(map(numerial, y))

But, you can simplify your code using cut function from pandas.

y = pd.cut(y, [0, len(y)/3, 2*len(y)/3, len(y)], labels=[0, 1, 2])

Or, even simpler:

pd.cut(y, bins=3, labels=False)
  • Related