Why value_counts in one column of the dataset adds everything except one specific category (DDoS)?-CodePudding

I have a dataset using three csv, and I need to know the unique values and their quantity in one specific column (Label), to plot later. It works perfectly until DDoS. I already check the raw data and everything is fine. How can I solve this issue?

dataset['Label'].value_counts()

Probe 98129

DDoS 73529

Normal 68424

DoS 53616

DDoS 48413

BFA 1405

Web-Attack 192

BOTNET 164

U2R 17

Name: Label, dtype: int64

CodePudding user response：

It seems that they are different in way invisble to you, consider following snippet

import pandas as pd
labels = pd.Series(['DDoS','DDoS','DDoS','DDoS '])
print(labels.value_counts())

output

DDoS     3
DDoS     1
dtype: int64

It does look similar to your case, however when you do

print(labels.value_counts().index)

output

Index(['DDoS', 'DDoS '], dtype='object')

Situation is explained - there is DDoS and DDoS (i.e. DDoS followed by space) which are different strings. If this is case you might use .str.strip as follows

dataset['Label'].str.strip().value_counts()

Note that it will remove any leading/trailing whitespaces (like \t), not only spaces.

CodePudding user response：

Maybe the two DDoS values are not the same (a whitespace character somewhere?)

You can try to

dataset['Label'].str.strip().value_counts()