I'm trying to run a SVM classifier but it runs endlessly, its been 6 hours now and it's still running. This is the code
import numpy as np
from sklearn import preprocessing, model_selection, neighbors, svm
from sklearn.metrics import confusion_matrix
from sklearn import metrics
import pandas as pd
import cv2
import os
from random import shuffle
from tqdm import tqdm
import pandas as pd
df = pd.read_csv('dataset_binary.csv')
df.replace('?',-99999, inplace=True)
X = np.array(df.drop(['label'], 1))
y = np.array(df['label'])
X_train, X_test, y_train, y_test = model_selection.train_test_split(X, y, test_size=0.1)
print('Test Train Split Done')
clf = svm.SVC(kernel = 'linear')
clf.fit(X_train, y_train)
svm_predictions = clf.predict(X_test)
print('Classification')
accuracy = clf.score(X_test, y_test)
print("Accuracy =", accuracy)
report = metrics.classification_report(y_test,clf.predict(X_test))
print("Report")
print(report)
su_vec = clf.support_vectors_
print('support vectors')
print(su_vec)
The csv file which is the dataset here looks like this Screenshot of the CSV file
The CSV file has got 492981 entries
I'm running it in my laptop which is Core i7 9th Gen with 16 Gigs of RAM and GTX 1660 Ti GPU but i'm not using the GPU yet.
This seemed like a pretty straightforward code to run but it's been 6 hrs and it's still running What am I doing wrong here ?
CodePudding user response:
Try using SVC , that might solve the issue but it looks fine to me, SVM can be used both for classification and regression and thus uses more computing power, where as svc is only for classification purpose so uses less computing power so it will reduce the load on your cpu or simply use google collab.
CodePudding user response:
If you data is continous (e.g. not categorical or of a mixed-type), then you could aid the SVM by scaling the data (e.g. StandardScaler
or MinMaxScaler
). This should speed up the training.