My question is somewhat similar to this-:
let me interpret further more. first this is the data we can take.
a_str = '''
timestamp count
2021-08-16 20
2021-08-17 60
2021-08-18 35
2021-08-19 1
2021-08-20 0
2021-08-21 1
2021-08-22 50
2021-08-23 36
2021-08-24 68
2021-08-25 125
2021-08-26 54'''
a_str1 = '''
timestamp count
2021-07-16 20
2021-07-17 60
2021-07-18 35
2021-07-19 1
2021-07-20 0
2021-07-21 1
2021-07-22 50
2021-07-23 36
2021-07-24 68
2021-07-25 125
2021-07-26 54'''
a_str2 = '''
timestamp count
2021-06-16 20
2021-06-17 60
2021-06-18 35
2021-06-19 1
2021-06-20 0
2021-06-21 1
2021-06-22 50
2021-06-23 36
2021-06-24 68
2021-06-25 125
2021-06-26 54'''
As we know that Density graph gives info about (density of each values (X axis-: values, Y axis-: actual value)) like this one
what I want
(Plotting does not has to be what i posted above, it is just for the idea of What i want to do)
- On the X there are actual value of count
- want to plot "density" of it
- y axis is a month
before all, is it even possible like this? because i am using density plot so it might not be possible with that.
Thanks in advance.
CodePudding user response:
The following code draws kde curves at y-values 0,1,2,... Depending on your concrete situation, you might want to fine-tune some of the constants.
from matplotlib import pyplot as plt
from matplotlib.colors import to_rgb
import pandas as pd
import numpy as np
from scipy.stats import gaussian_kde
a1 = pd.DataFrame({'timestamp': pd.date_range('20210816', periods=11),
'count': np.random.randint(10, 21, 11)})
a2 = pd.DataFrame({'timestamp': pd.date_range('20210716', periods=11),
'count': np.random.randint(10, 21, 11)})
a3 = pd.DataFrame({'timestamp': pd.date_range('20210616', periods=11),
'count': np.random.randint(10, 21, 11)})
month_counts = [df['count'].values for df in [a1, a2, a3]]
month_names = ['August', 'July', 'June']
max_count = max([count_i.max() for count_i in month_counts])
min_count = min([count_i.min() for count_i in month_counts])
xs = np.linspace(min_count - 3, max_count 3, 200)
month_kde = [gaussian_kde(count_i, bw_method=0.2) for count_i in month_counts]
max_kde = max([kde_i(xs).max() for kde_i in month_kde])
overlap_factor = 1.9
whiten_factor = 0.5
fig, ax = plt.subplots(figsize=(12, 8))
for index, color in zip(range(len(month_names) - 1, -1, -1), np.tile(plt.cm.Set2.colors, (3, 1))):
kde = month_kde[index](xs) / max_kde * overlap_factor
ax.plot(xs, index kde, lw=2, color=color, zorder=50 - index)
whitened = np.array(to_rgb(color)) * (1 - whiten_factor) whiten_factor
ax.fill_between(xs, index, index kde, color=whitened, alpha=0.8, zorder=50 - index)
ax.set_xlim(xs[0], xs[-1])
ax.set_xlabel('Distribution of Counts')
ax.set_yticks(np.arange(len(month_names)))
ax.set_yticklabels(month_names)
for spine in ('top', 'left', 'right'):
ax.spines[spine].set(visible=False)
plt.tight_layout()
plt.show()
Here is a variation using the magma
color map as gradient:
month_names = ['January', 'February', 'March', 'April', 'May', 'June',
'July', 'August', 'September', 'October', 'November', 'December']
month_counts = [np.random.randint(15 - abs(k - 6), 30 - abs(k - 6), 30) for k in range(len(month_names))]
max_count = max([count_i.max() for count_i in month_counts])
min_count = min([count_i.min() for count_i in month_counts])
xs = np.linspace(min_count - 3, max_count 3, 200)
month_kde = [gaussian_kde(count_i, bw_method=0.2) for count_i in month_counts]
max_kde = max([kde_i(xs).max() for kde_i in month_kde])
overlap_factor = 1.9
fig, ax = plt.subplots(figsize=(12, 8))
for index in range(len(month_names)):
kde = month_kde[::-1][index](xs) / max_kde * overlap_factor
ax.plot(xs, index kde, lw=2, color='black', zorder=50 - 2 * index 1)
fill_poly = ax.fill_between(xs, index, index kde, color='none', alpha=0.8)
verts = np.vstack([p.vertices for p in fill_poly.get_paths()])
gradient = ax.imshow(np.linspace(0, 1, 256).reshape(1, -1), cmap='magma', aspect='auto', zorder=50 - 2 * index,
extent=[verts[:, 0].min(), verts[:, 0].max(), verts[:, 1].min(), verts[:, 1].max()])
gradient.set_clip_path(fill_poly.get_paths()[0], transform=plt.gca().transData)
ax.set_xlim(xs[0], xs[-1])
ax.set_ylim(ymin=-0.2)
ax.set_xlabel('Distribution of Counts')
ax.set_yticks(np.arange(len(month_names)))
ax.set_yticklabels(month_names[::-1])
for spine in ('top', 'left', 'right'):
ax.spines[spine].set(visible=False)
plt.tight_layout()
plt.show()