I'm in my way of studing anomaly detection for speech data. My original code written with LSTM, but I'm in faced to imbalance dataset. So I'm trying to have some insights from Pyod.
On trying from Pyod sampling data, I just copied and pasted their code to my colab, but I encounter error as "ValueError: 'c' argument has 1000 elements, which is inconsistent with 'x' and 'y' with size 500."
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from pyod.utils.data import generate_data
contamination = 0.1 # percentage of outliers 10%
n_train = 500 # number of training points
n_test = 500 # number of testing points
n_features = 2 # number of features
X_train, y_train, X_test, y_test = generate_data(
n_train=n_train, n_test=n_test, n_features= n_features, contamination=contamination)
# Make the 2d numpy array a pandas dataframe for each manipulation
X_train_pd = pd.DataFrame(X_train)
# print(X_train_pd)
# print(y_train)
# Plot
plt.scatter(X_train_pd[0], X_train_pd[1], c=y_train, alpha=0.8)
plt.title('Scatter plot pythonspot.com')
plt.xlabel('x')
plt.ylabel('y')
plt.show()
CodePudding user response:
it seems that c=y_train is the source of error. c option is for color: you might need to "translate" your y_train into some form of color format. Just to make the program running syntactically correct (but may not what you want), change to:
plt.scatter(X_train_pd[0], X_train_pd[1], c=[(1,0,0)]*len(X_train_pd[0]), alpha=0.8)