Home > front end >  Aligning x axis numerical values with corresponding categorical variables in seaborn on twinx
Aligning x axis numerical values with corresponding categorical variables in seaborn on twinx

Time:11-23

I am trying to plot a dual x-axis seaborn linegraph but the output based on below code shows that both x-axis are stuck in the corner rather than aligning the values with each other.

What am I missing?

import matplotlib.pyplot as plt
%matplotlib inline
import seaborn as sns
plt.style.use("fivethirtyeight")

             
fig, ax = plt.subplots(1,1,figsize=(10,10))
ax2 = ax.twinx()
sns.lineplot(x="BUCKET_SEGMENT_1", y= 'Percentage to Bucket Total', data=df, ax=ax)
sns.lineplot(x="PAST_DUE_DAYS", y= 'Percentage to Bucket Total',data=df, ax=ax2)

plt.show()

Stuck second x-axis

DATA

BUCKET_SEGMENT_1    PAST_DUE_DAYS   BAL Percentage to Bucket Total
0   Bucket1            3.0  878698.045  74.431434
1   Bucket1            4.0  25747.397   2.180971
2   Bucket2            6.0  171683.523  14.54271
3   Bucket2            7.0  55659.448   4.714716
4   Bucket3            8.0  1589.759    0.134662

Here's an example of what I would like to see

enter image description here

I also tried to use ax and ax2 set limits and xticks but couldn't get far

DF = df.copy()
DF.set_index("BUCKET_SEGMENT_1",inplace=True)
fig, ax = plt.subplots(1,1,figsize=(10,10))
ax2 = ax.twinx()
sns.lineplot(data=DF,x="PAST_DUE_DAYS",y='Percentage to Bucket Total',ax=ax)
sns.lineplot(data=DF.reset_index(),x='BUCKET_SEGMENT_1',y='Percentage to Bucket Total',ax=ax2)
x = int(max(DF.PAST_DUE_DAYS))
ax.set_xlim(0,x)
ax2.set_xticks(DF.index.unique())

CodePudding user response:

To find out what is going on, it helps to first plot both graphs in separate subplots:

import matplotlib.pyplot as plt
import pandas as pd
from io import StringIO

df_str = '''BUCKET_SEGMENT_1    PAST_DUE_DAYS   BAL "Percentage to Bucket Total"
0   Bucket1            3.0  878698.045  74.431434
1   Bucket1            4.0  25747.397   2.180971
2   Bucket2            6.0  171683.523  14.54271
3   Bucket2            7.0  55659.448   4.714716
4   Bucket3            8.0  1589.759    0.134662'''
df = pd.read_csv(StringIO(df_str), delim_whitespace=True)

fig, (ax1, ax2) = plt.subplots(ncols=2, figsize=(14, 5))
sns.lineplot(x="BUCKET_SEGMENT_1", y='Percentage to Bucket Total', data=df, ax=ax1)
sns.lineplot(x="PAST_DUE_DAYS", y='Percentage to Bucket Total', data=df, ax=ax2)
ax2.set_xticks(df["PAST_DUE_DAYS"])
plt.tight_layout()
plt.show()

sns.lineplot categorical vs numerical

The left subplot uses the categorical bucket names for the x-axis. All percentages are averaged, and an error band is shown. Bucket1, Bucket2 and Bucket3, are at internal positions 0, 1 and 2.

The right subplot uses the numerical PAST_DUE_DAYS positions for the x-axis. There is only one percentage value per day, so no error band is needed.

ax.twinx() will share the x axes. This will mix the first subplot positions (0,1,2) with the numerical positions of the second subplot (3,4,6,7,8). The result is the weird plot you see (clearly from a different dataframe as the example).

Now, to mark the buckets into the numerical plot, you could use groupby() to find out the positions. With the minimum and maximum days, a text could be positioned. Also, alternating colored bands could visualize the ranges.

Here is some example code to get you started:

fig, ax = plt.subplots(figsize=(10, 5))
sns.lineplot(x="PAST_DUE_DAYS", y='Percentage to Bucket Total', data=df, ax=ax)
colors = ['red', 'green']
previous_max_days = 0
for (group_name, group_data), color in zip(df.groupby("BUCKET_SEGMENT_1"), colors * len(df)):
    # min_days = group_data['PAST_DUE_DAYS'].min() - 0.5
    min_days = previous_max_days
    max_days = group_data['PAST_DUE_DAYS'].max()   0.5
    ax.axvspan(min_days, max_days, color=color, alpha=0.1)
    ax.text((min_days   max_days) / 2, 0.9, group_name, ha='center', fontsize=20, color=color,
            transform=ax.get_xaxis_transform())
    previous_max_days = max_days
plt.show()

sns.lineplot with bands per bucket

  • Related