Home > Net >  How can I use SVM classifier to detect outliers in percentage changes?
How can I use SVM classifier to detect outliers in percentage changes?

Time:11-05

I have a pandas dataframe that is in the following format:

enter image description here

This contains the % change in stock prices each day for 3 companies MSFT, F and BAC.

I would like to use a OneClassSVM calculator to detect whether the data is an outlier or not. I have tried the following code, which I believe detects the rows which contain outliers.

#Import libraries
from sklearn.svm import OneClassSVM
import matplotlib.pyplot as plt


#Create SVM Classifier
svm = OneClassSVM(kernel='rbf', 
gamma=0.001, nu=0.03)
#Use svm to fit and predict
svm.fit(delta)
pred = svm.predict(delta)

#If the values are outlier the prediction 
#would be -1
outliers = where(pred==-1)
#Print rows with outliers
print(outliers)

This gives the following output:

enter image description here

I would like to then add a new column to my dataframe that includes whether the data is an outlier or not. I have tried the following code but I get an error due to the lists being different lengths as shown below.

condition = (delta.index.isin(outliers))

assigned_value = "outlier"

df['isoutlier'] = np.select(condition, 
assigned_value)

enter image description here

Would you be able to let me know I could add this column given that the list of the rows containing outliers is much shorter please?

CodePudding user response:

It's not very clear what is delta and df in your code. I am assuming they are the same data frame.

You can use the result from svm.predict , here we leave it as blank '' if not outlier:

import numpy as np
df = pd.DataFrame(np.random.uniform(0,1,(100,3)),columns=['A','B','C'])

svm = OneClassSVM(kernel='rbf', gamma=0.001, nu=0.03)
svm.fit(df)
pred = svm.predict(df)

df['isoutlier'] = np.where(pred == -1 ,'outlier','')

           A         B         C isoutlier
0   0.869475  0.752420  0.388898          
1   0.177420  0.694438  0.129073          
2   0.011222  0.245425  0.417329          
3   0.791647  0.265672  0.401144          
4   0.538580  0.252193  0.142094          
..       ...       ...       ...       ...
95  0.742192  0.079426  0.676820   outlier
96  0.619767  0.702513  0.734390          
97  0.872848  0.251184  0.887500   outlier
98  0.950669  0.444553  0.088101          
99  0.209207  0.882629  0.184912          
  • Related