Home > Software design >  pyplot.text() on a boxplot without a y coordinate position
pyplot.text() on a boxplot without a y coordinate position

Time:10-27

I am trying to create labels for the median, outliers and quartiles of a 1-dimensional boxplot that has only x-coordinate values. I'de like to label the query, url, and CTR for the median, quartiles and outliers. Here is what the data frame looks like:

URL Clicks CTR Query
website.com/1 20 0.06 query1
website.com/2 4 0.10 query2

My boxplot without labels: enter image description here

My code for the above plot:

df_ = df[df.Clicks > 4 ]
sns.boxplot(x=df_['CTR'])
plt.xlabel("CTR")
plt.show()

What I have so far are the values and outlier limit:

median = df_['CTR'].median()
ctr_q1 = df_.quantile(0.25)['CTR']
ctr_q3 = df_.quantile(0.75)['CTR']
outlier_lim = ctr_q3   1.5 * (ctr_q3 - ctr_q1)

My problem is that while trying to add text, I'm not sure what to put into plt.text() without having a y value to locate in the following code:

for i in df_["CTR"]:
    if i > outlier_lim:
        plt.text(x = i, y=????? s = "here")

If I try putting an arbitrary value like 0 or 1 for y, I get something like this:

>>> for i in df_["CTR"]:
...     if i > outlier_lim:
...         plt.text(x = i, y = 0, s = "here")
... 
Text(0.6923076923076923, 0, 'here')
Text(0.47619047619047616, 0, 'here')
Text(0.5333333333333333, 0, 'here')
Text(0.4583333333333333, 0, 'here')
Text(0.5, 0, 'here')
Text(0.5, 0, 'here')
Text(0.5, 0, 'here')
Text(0.5384615384615384, 0, 'here')
Text(0.5833333333333334, 0, 'here')
Text(0.5, 0, 'here')
Text(0.5, 0, 'here')
Text(0.55, 0, 'here')
Text(0.6153846153846154, 0, 'here')
>>> plt.xlabel("CTR")
Text(0.5, 0, 'CTR')
>>> plt.show()

enter image description here

Most of the related posts I've seen use either seaborn or matplotlib functions that require a y parameter. Does anyone have a solution for when y doesn't exist?

Thanks!

CodePudding user response:

The y-position of the central line is at y=0. The box goes from y=-0.4 to y=0.4, but note that the y-axis is reversed (negative values at the top). The y-values do exist, but are hidden automatically in order not to distract.

Here is some example code (note that seaborn automatically sets the xlabel to the name of the column):

from matplotlib import pyplot as plt
from matplotlib.ticker import MultipleLocator, ScalarFormatter
import seaborn as sns
import numpy as np
import pandas as pd

np.random.seed(2021)
df_ = pd.DataFrame({'CTR': np.random.geometric(0.5, size=80) / 100})
ax = sns.boxplot(x=df_['CTR'])

# show the ytick positions, as a reference
ax.yaxis.set_major_locator(MultipleLocator(0.1))
ax.yaxis.set_major_formatter(ScalarFormatter())

median = df_['CTR'].median()
ctr_q1 = df_.quantile(0.25)['CTR']
ctr_q3 = df_.quantile(0.75)['CTR']
outlier_lim = ctr_q3   1.5 * (ctr_q3 - ctr_q1)
for i in df_["CTR"]:
    if i > outlier_lim:
        ax.text(x=i, y=0.01, s="here", ha='center', va='top')
plt.show()

sns.boxplot with added text

  • Related