I am having an issue at the moment, I think im making it far more complicated than it needs to be. my csv file is 31 rows by 500. I need to import this, split it in a 70/30 ratio and then be able to use the first column as my 'y' value for a neural network, and the remaining 30 columns need to be my 'x' value.
ive implemented the below code to do this, but when I run it through my basic sigmoid and testing functions, it provides results in a weird format i.e. [6.54694655e-06].
I believe this is due to my splitting/importing of the data, which I think I have done wrong. I need to import the data into arrays that are readable by my functions, and be able to separate my first column specifically to a 'y' value. how do I go about this?
df = pd.read_csv(r'data.csv', header=None)
df.to_numpy()
#splitting data 70/30
trainingdata= df[:329]
testingdata= df[:141]
#converting data to seperate arrays for training and testing
training_features= trainingdata.loc[:, trainingdata.columns != 0].values.reshape(329,30)
training_labels = trainingdata[0]
training_labels = training_labels.values.reshape(329,1)
testing_features = testingdata[0]
testing_labels = testingdata.loc[:, testingdata.columns != 0]
CodePudding user response:
Usually for splitting the dataframe on test and train data I use sklearn.model_selection.train_test_split
. Documentation here.
Some other methods are described here Hope this will help you!
CodePudding user response:
Make you train/test split easy by using sklearn.model_selection.train_test_split
.
If you don't have sklearn installed, first install it by running pip install -U scikit-learn
.
Then
from sklearn.model_selection import train_test_split
df = pd.read_csv(r'data.csv', header=None)
# X is your features, y is your target column
X = df.loc[:,1:]
y = df.loc[:,0]
# Use train_test_split function with test size of 30%
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.30, random_state=42)
CodePudding user response:
df = pd.read_csv(r'data.csv') df.to_numpy() print(df)
CodePudding user response:
Use:
train = df.sample(frac=.7)
test = df.loc[x for x in df..index.values if x not in train.index.values]
X_train = train.loc[:,1:]
y_train = train.loc[:,1]
X_ttest = test.loc[:,1:]
y_test = test.loc[:,1]