I am working on iris dataset. I was able to split the dataset with training and test set.
X_train, X_test, Y_train, Y_test = train_test_split(X,Y,test_size = .3, random_state = 50)
Now I want to extract two individual csv files one for training dataset and another one for test dataset.
training_set.csv will contain X_train and Y_train.
test_set.csv will contain X_test and Y_test.
I have tried this code block
training_set = pd.DataFrame(X_train, Y_train)
Which retured
sepal.width petal.length petal.width
variety
Setosa NaN NaN NaN
Setosa NaN NaN NaN
Setosa NaN NaN NaN
Virginica NaN NaN NaN
Virginica NaN NaN NaN
... ... ... ...
Versicolor NaN NaN NaN
Virginica NaN NaN NaN
Setosa NaN NaN NaN
Virginica NaN NaN NaN
Virginica NaN NaN NaN
105 rows × 3 columns
How should I proceed?
Thank you.
CodePudding user response:
From my answer here, load the dataset and convert it to a dataframe:
import pandas as pd
import numpy as np
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
iris = load_iris()
df = pd.DataFrame(data=np.c_[iris['data'], iris['target']],
columns= iris['feature_names'] ['target']).astype({'target': int}) \
.assign(species=lambda x: x['target'].map(dict(enumerate(iris['target_names']))))
X_train, X_test, y_train, y_test = \
train_test_split(df.iloc[:, :4], df['species'], test_size=.3, random_state=50)
training_set = pd.concat([X_train, y_train], axis=1)
test_set = pd.concat([X_test, y_test], axis=1)
training_set.to_csv('training.csv')
test_set.to_csv('test.csv')
Note: you can use target
(int) or species
(str) column as y
vector.
CodePudding user response:
IIUC, you trying to save the test and training dataset into a csv. is that correct?
did you try this and it doesn't work?
pd.DataFrame(X_train, Y_train).to_csv('training.csv')
OR
training_set.to_csv('training.csv')