How to import a CSV file, split it 70/30 and then use first column as my 'y' value?-CodePudding

I am having an issue at the moment, I think im making it far more complicated than it needs to be. my csv file is 31 rows by 500. I need to import this, split it in a 70/30 ratio and then be able to use the first column as my 'y' value for a neural network, and the remaining 30 columns need to be my 'x' value.

ive implemented the below code to do this, but when I run it through my basic sigmoid and testing functions, it provides results in a weird format i.e. [6.54694655e-06].

I believe this is due to my splitting/importing of the data, which I think I have done wrong. I need to import the data into arrays that are readable by my functions, and be able to separate my first column specifically to a 'y' value. how do I go about this?

df = pd.read_csv(r'data.csv', header=None)
df.to_numpy()

#splitting data 70/30
trainingdata= df[:329]
testingdata= df[:141]

#converting data to seperate arrays for training and testing
training_features= trainingdata.loc[:, trainingdata.columns != 0].values.reshape(329,30)
training_labels =  trainingdata[0]
training_labels = training_labels.values.reshape(329,1)

testing_features = testingdata[0]
testing_labels = testingdata.loc[:, testingdata.columns != 0]

CodePudding user response：

Usually for splitting the dataframe on test and train data I use sklearn.model_selection.train_test_split. Documentation here. Some other methods are described here Hope this will help you!

CodePudding user response：

Make you train/test split easy by using sklearn.model_selection.train_test_split. If you don't have sklearn installed, first install it by running pip install -U scikit-learn.

Then

from sklearn.model_selection import train_test_split

df = pd.read_csv(r'data.csv', header=None)

# X is your features, y is your target column
X = df.loc[:,1:] 
y = df.loc[:,0]

# Use train_test_split function with test size of 30%
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.30, random_state=42)

CodePudding user response：

df = pd.read_csv(r'data.csv') df.to_numpy() print(df)

CodePudding user response：

Use:

train = df.sample(frac=.7)
test = df.loc[x for x in df..index.values if x not in train.index.values]
X_train = train.loc[:,1:]
y_train = train.loc[:,1]
X_ttest = test.loc[:,1:]
y_test = test.loc[:,1]