I need to create test and train from one set date. However, I can't quite do it with sklears.
My Target variable: SalePrice
train = pd.read_csv(r'C:\Users\pkoni\Desktop\train.csv')
target = train['SalePrice']
X, y = train.data, train.target
train_X, test_X, train_y, test_y = train_test_split(X, y,
train_size=0.5,
test_size=0.5,
random_state=123)
i dont know what i should add to X, y.
CodePudding user response:
Not sure I understand fully. If you are just trying to randomly split then this should work:
y = train['SalePrice']
X = train.drop('SalePrice', axis=1)
X_train, X_test, y_train, y_test = train_test_split(X, y,
test_size=0.5,
random_state=0)
If you want to split all points after a certain date (e.g. 2010) to test and all points before to train then a different solution is needed.
test = train[train['Yr.Sold'] < 2010]
train = train[train['Yr.Sold'] > 2010]
Then after splitting test and train you can assign labels and features for each (see x,y in first code segment).