I have a dataset using three csv, and I need to know the unique values and their quantity in one specific column (Label), to plot later. It works perfectly until DDoS. I already check the raw data and everything is fine. How can I solve this issue?
dataset['Label'].value_counts()
Probe 98129
DDoS 73529
Normal 68424
DoS 53616
DDoS 48413
BFA 1405
Web-Attack 192
BOTNET 164
U2R 17
Name: Label, dtype: int64
CodePudding user response:
It seems that they are different in way invisble to you, consider following snippet
import pandas as pd
labels = pd.Series(['DDoS','DDoS','DDoS','DDoS '])
print(labels.value_counts())
output
DDoS 3
DDoS 1
dtype: int64
It does look similar to your case, however when you do
print(labels.value_counts().index)
output
Index(['DDoS', 'DDoS '], dtype='object')
Situation is explained - there is DDoS
and DDoS
(i.e. DDoS
followed by space) which are different strings. If this is case you might use .str.strip
as follows
dataset['Label'].str.strip().value_counts()
Note that it will remove any leading/trailing whitespaces (like \t
), not only spaces.
CodePudding user response:
Maybe the two DDoS
values are not the same (a whitespace character somewhere?)
You can try to
dataset['Label'].str.strip().value_counts()