I am trying to plot y_train using bar, I am getting the below error. Kindly help me fix it I am unable to plot this due some error since yesterday.
from sklearn.model_selection import train_test_split
import numpy as np
X = reviews['Text']
y= reviews['Score'].values
X_train, X_test, y_train, y_test =train_test_split(X,y ,test_size=0.20,stratify=y,random_state=33)
checking the shape of split of data
print(X_train.shape, y_train.shape)
print(X_test.shape, y_test.shape)
(80000,) (80000,)
(20000,) (20000,)
#plot bar graphs of y_train and y_test
import matplotlib.pyplot as plt
plt.bar([1,0],y_train.value_counts().values,color ='green')
plt.xlabel("Count")
plt.ylabel("y_train values")
plt.title("Distribution of y_train")
plt.show()
error:
---------------------------------------------------------------------------
AttributeError Traceback (most recent call last)
<ipython-input-18-62460aedca56> in <module>()
1 #plot bar graphs of y_train and y_test
2 import matplotlib.pyplot as plt
----> 3 plt.bar(y_train.value_counts().values,color ='green')
4
5 plt.xlabel("Count")
AttributeError: 'numpy.ndarray' object has no attribute 'value_counts'
CodePudding user response:
The problem occurs with y = reviews['Score'].values
, according to the documentation it returns a Numpy representation of the DataFrame.
You are trying to call this method on a Numpy type, which is not provided by numpy. The value_counts
method is supported by Pandas library for DataFrames.
Try to change your code to the following and it might work:
y = reviews['Score']
The type of y changed to pandas.core.series.Series
and you might able to call your following code blocks.
https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.values.html https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.value_counts.html?highlight=value_counts