I'm trying to create some auto generated reports from networking equipment.
The data is sent from the sensors in bytes, as expected with networking equipment. However the "issue" that I'm having with this is, that I would like to modify the y-axis to interpreter the result as either strictly to megabytes(mb) or even better kb/mb/gb depending on the average stream size of the set.
For reference, if I look at this through the gui (graphite) this specific query yields a render with the Y-axis to be Mb
While matlibplot interprates the float values as floats hence defaults to scientific notation
# a small fraction sample data
# Typically handing ~8400 rows per iteration
[
[
973920.8333330154,
'2022-03-01 00:00:00'
],
[
993574,
'2022-03-02 00:00:00'
],
[
964486.5,
'2022-03-03 00:00:00'
],
[
972345.4166669846,
'2022-03-04 00:00:00'
],
[
972770.9166665077,
'2022-03-05 00:00:00'
],
[
961019.75,
'2022-03-06 00:00:00'
],
[
957663.9166665077,
'2022-03-07 00:00:00'
],
[
957588.5,
'2022-03-08 00:00:00'
],
[
959108,
'2022-03-09 00:00:00'
],
[
959551.8333334923,
'2022-03-10 00:00:00'
]
]
This is my first interaction with plotting so might be a cleaver function to solve this that I have not found.
All ideas are welcome.
Update:
I have not yet been able to figure out custom major formatter suggested by MR.T
I have a prototype of changing the dataframe values to MB before plotting, but needs to test if mean is a good fit for my purpose
import pandas as pd
import numpy as np
def sizeof_fmt(num):
for x in ['bytes', 'KB', 'MB', 'GB', 'TB']:
if num < 1024.0:
return "%3.1f_%s" % (num, x)
num /= 1024.0
df = pd.DataFrame(
[[2097152, '2020-01-02 00:00:00'],
[2831155.2, '2020-01-02 00:00:00'],
[1077936128, '2020-01-02 00:00:00'],
[np.NaN, '2020-01-02 00:00:00'],
[np.NaN, '2020-01-02 00:00:00']])
df.plot()
operator = sizeof_fmt(df[0].mean()).split('_')[1]
print(sizeof_fmt(df[0].mean()))
print(operator)
if (operator == 'MB'):
df[[0]] = df[[0]].div(1024*1024)
print(df)
CodePudding user response:
Your function is perfect but you never apply it to the formatter. You can do this like:
from matplotlib import pyplot as plt
import matplotlib.ticker as tkr
import numpy as np
def sizeof_fmt(x, pos):
if x<0:
return ""
for x_unit in ['bytes', 'kB', 'MB', 'GB', 'TB']:
if x < 1024.0:
return "%3.1f %s" % (x, x_unit)
x /= 1024.0
#sample data
import pandas as pd
df = pd.DataFrame(
[[12097152, '2020-01-02 00:00:00'],
[31155.2, '2020-01-02 10:00:00'],
[np.NaN, '2020-01-02 12:00:00'],
[1234, '2020-01-03 04:00:00'],
[1877936128, '2020-01-03 05:10:00']])
df[1] = pd.to_datetime(df[1])
ax = df.plot(x=1, y=0)
ax.yaxis.set_major_formatter(tkr.FuncFormatter(sizeof_fmt))
plt.show()
The advantage of the Formatter is that the updated ticks when you zoom in are still in the correct format.
You could also write your own Locator method to ensure that ticks are always on multiples of 1024 but I have never looked into this.