Home > Blockchain >  Python How to create train/test data
Python How to create train/test data

Time:05-18

I need to create test and train from one set date. However, I can't quite do it with sklears.

My Target variable: SalePrice
train = pd.read_csv(r'C:\Users\pkoni\Desktop\train.csv')
target = train['SalePrice']
X, y = train.data, train.target
train_X, test_X, train_y, test_y = train_test_split(X, y, 
                                                    train_size=0.5,
                                                    test_size=0.5,
                                                    random_state=123)

i dont know what i should add to X, y.

enter image description here

CodePudding user response:

Not sure I understand fully. If you are just trying to randomly split then this should work:

y = train['SalePrice']
X = train.drop('SalePrice', axis=1)
X_train, X_test, y_train, y_test = train_test_split(X, y,
                                                    test_size=0.5,
                                                    random_state=0) 

If you want to split all points after a certain date (e.g. 2010) to test and all points before to train then a different solution is needed.

test = train[train['Yr.Sold'] < 2010]
train = train[train['Yr.Sold'] > 2010]

Then after splitting test and train you can assign labels and features for each (see x,y in first code segment).

  • Related