Home > OS >  can we do Denisty plotting with timestamp in Python?
can we do Denisty plotting with timestamp in Python?

Time:02-11

My question is somewhat similar to this-: like this

let me interpret further more. first this is the data we can take.

a_str = '''
timestamp      count
2021-08-16     20
2021-08-17     60
2021-08-18     35
2021-08-19      1
2021-08-20      0
2021-08-21      1
2021-08-22     50
2021-08-23     36
2021-08-24     68
2021-08-25    125
2021-08-26     54'''

a_str1 = '''
timestamp      count
2021-07-16     20
2021-07-17     60
2021-07-18     35
2021-07-19      1
2021-07-20      0
2021-07-21      1
2021-07-22     50
2021-07-23     36
2021-07-24     68
2021-07-25    125
2021-07-26     54'''

a_str2 = '''
timestamp      count
2021-06-16     20
2021-06-17     60
2021-06-18     35
2021-06-19      1
2021-06-20      0
2021-06-21      1
2021-06-22     50
2021-06-23     36
2021-06-24     68
2021-06-25    125
2021-06-26     54'''

As we know that Density graph gives info about (density of each values (X axis-: values, Y axis-: actual value)) like this one this is just an exaple

what I want

(Plotting does not has to be what i posted above, it is just for the idea of What i want to do)

  1. On the X there are actual value of count
  2. want to plot "density" of it
  3. y axis is a month

before all, is it even possible like this? because i am using density plot so it might not be possible with that.

Thanks in advance.

CodePudding user response:

The following code draws kde curves at y-values 0,1,2,... Depending on your concrete situation, you might want to fine-tune some of the constants.

from matplotlib import pyplot as plt
from matplotlib.colors import to_rgb
import pandas as pd
import numpy as np
from scipy.stats import gaussian_kde

a1 = pd.DataFrame({'timestamp': pd.date_range('20210816', periods=11),
                   'count': np.random.randint(10, 21, 11)})
a2 = pd.DataFrame({'timestamp': pd.date_range('20210716', periods=11),
                   'count': np.random.randint(10, 21, 11)})
a3 = pd.DataFrame({'timestamp': pd.date_range('20210616', periods=11),
                   'count': np.random.randint(10, 21, 11)})
month_counts = [df['count'].values for df in [a1, a2, a3]]
month_names = ['August', 'July', 'June']

max_count = max([count_i.max() for count_i in month_counts])
min_count = min([count_i.min() for count_i in month_counts])
xs = np.linspace(min_count - 3, max_count   3, 200)
month_kde = [gaussian_kde(count_i, bw_method=0.2) for count_i in month_counts]
max_kde = max([kde_i(xs).max() for kde_i in month_kde])
overlap_factor = 1.9
whiten_factor = 0.5

fig, ax = plt.subplots(figsize=(12, 8))
for index, color in zip(range(len(month_names) - 1, -1, -1), np.tile(plt.cm.Set2.colors, (3, 1))):
    kde = month_kde[index](xs) / max_kde * overlap_factor
    ax.plot(xs, index   kde, lw=2, color=color, zorder=50 - index)
    whitened = np.array(to_rgb(color)) * (1 - whiten_factor)   whiten_factor
    ax.fill_between(xs, index, index   kde, color=whitened, alpha=0.8, zorder=50 - index)
ax.set_xlim(xs[0], xs[-1])
ax.set_xlabel('Distribution of Counts')
ax.set_yticks(np.arange(len(month_names)))
ax.set_yticklabels(month_names)
for spine in ('top', 'left', 'right'):
    ax.spines[spine].set(visible=False)
plt.tight_layout()
plt.show()

kde curves for multiple months

Here is a variation using the magma color map as gradient:

month_names = ['January', 'February', 'March', 'April', 'May', 'June',
               'July', 'August', 'September', 'October', 'November', 'December']
month_counts = [np.random.randint(15 - abs(k - 6), 30 - abs(k - 6), 30) for k in range(len(month_names))]

max_count = max([count_i.max() for count_i in month_counts])
min_count = min([count_i.min() for count_i in month_counts])
xs = np.linspace(min_count - 3, max_count   3, 200)
month_kde = [gaussian_kde(count_i, bw_method=0.2) for count_i in month_counts]
max_kde = max([kde_i(xs).max() for kde_i in month_kde])
overlap_factor = 1.9

fig, ax = plt.subplots(figsize=(12, 8))
for index in range(len(month_names)):
    kde = month_kde[::-1][index](xs) / max_kde * overlap_factor
    ax.plot(xs, index   kde, lw=2, color='black', zorder=50 - 2 * index   1)
    fill_poly = ax.fill_between(xs, index, index   kde, color='none', alpha=0.8)

    verts = np.vstack([p.vertices for p in fill_poly.get_paths()])
    gradient = ax.imshow(np.linspace(0, 1, 256).reshape(1, -1), cmap='magma', aspect='auto', zorder=50 - 2 * index,
                         extent=[verts[:, 0].min(), verts[:, 0].max(), verts[:, 1].min(), verts[:, 1].max()])
    gradient.set_clip_path(fill_poly.get_paths()[0], transform=plt.gca().transData)

ax.set_xlim(xs[0], xs[-1])
ax.set_ylim(ymin=-0.2)

ax.set_xlabel('Distribution of Counts')
ax.set_yticks(np.arange(len(month_names)))
ax.set_yticklabels(month_names[::-1])
for spine in ('top', 'left', 'right'):
    ax.spines[spine].set(visible=False)
plt.tight_layout()
plt.show()

kde curves for 12 months, using magma color map as gradient

  • Related