Recently I read an peper which offer visualization type for network traffic connection as below:
What is the name of this type of visualization, and How can we plot it using UGR16 dataset?
I am also interested to know and understand the Anomaly they presented by this type of visualization on the 2nd & 5th burst during 5 mins observation in 1st of August as below: further info about the dataset could be found here.
I have tried following implementation but I couldn't figure out how can reach this type of plot for anomaly context:
import requests
import pandas as pd
import matplotlib.pyplot as plt
# Get UGR16 Dataset
DOWNLOAD_REPO = "https://raw.githubusercontent.com/krisbolton/machine-learning-for-security/master/"
DOWNLOAD_FILENAME = DOWNLOAD_REPO "ugr16-july-week5-first5k.csv"
DATASET_FILENAME = "ugr16-july-week5-first5k.csv"
response = requests.get(DOWNLOAD_FILENAME)
response.raise_for_status()
with open(DATASET_FILENAME, "wb") as f:
f.write(response.content)
print("Download complete.")
# View dataset info
df = pd.read_csv("ugr16-july-week5-first5k.csv")
df.info()
df.shape
#(4999, 13)
# View visual representation of dataset columns
df.hist(bins=50, figsize=(30,15))
plt.show()
df.columns = ['Date_time', 'Duration', 'Source_IP',
'Destination_IP', 'Source_Port', 'Destination_Port',
'Protocol', 'Flag', 'Forwarding_status', 'ToS',
'Packets', 'Bytes', 'Label']
df.head()
the following is the nature of dataframe:
CodePudding user response:
It's called a hive plot, it originated in Krzywinski et al. The idea is to first sort the nodes into a few significant groups (using traditional network analysis techniques), use them as axes, then align the nodes along these axes and plot edges between and within groups.
Here is a Python library for hive plotting, and here's a nice blog post applying it to the karate club dataset from NetworkX.