I am trying to plot a dual x-axis seaborn linegraph but the output based on below code shows that both x-axis are stuck in the corner rather than aligning the values with each other.
What am I missing?
import matplotlib.pyplot as plt
%matplotlib inline
import seaborn as sns
plt.style.use("fivethirtyeight")
fig, ax = plt.subplots(1,1,figsize=(10,10))
ax2 = ax.twinx()
sns.lineplot(x="BUCKET_SEGMENT_1", y= 'Percentage to Bucket Total', data=df, ax=ax)
sns.lineplot(x="PAST_DUE_DAYS", y= 'Percentage to Bucket Total',data=df, ax=ax2)
plt.show()
DATA
BUCKET_SEGMENT_1 PAST_DUE_DAYS BAL Percentage to Bucket Total
0 Bucket1 3.0 878698.045 74.431434
1 Bucket1 4.0 25747.397 2.180971
2 Bucket2 6.0 171683.523 14.54271
3 Bucket2 7.0 55659.448 4.714716
4 Bucket3 8.0 1589.759 0.134662
Here's an example of what I would like to see
I also tried to use ax and ax2 set limits and xticks but couldn't get far
DF = df.copy()
DF.set_index("BUCKET_SEGMENT_1",inplace=True)
fig, ax = plt.subplots(1,1,figsize=(10,10))
ax2 = ax.twinx()
sns.lineplot(data=DF,x="PAST_DUE_DAYS",y='Percentage to Bucket Total',ax=ax)
sns.lineplot(data=DF.reset_index(),x='BUCKET_SEGMENT_1',y='Percentage to Bucket Total',ax=ax2)
x = int(max(DF.PAST_DUE_DAYS))
ax.set_xlim(0,x)
ax2.set_xticks(DF.index.unique())
CodePudding user response:
To find out what is going on, it helps to first plot both graphs in separate subplots:
import matplotlib.pyplot as plt
import pandas as pd
from io import StringIO
df_str = '''BUCKET_SEGMENT_1 PAST_DUE_DAYS BAL "Percentage to Bucket Total"
0 Bucket1 3.0 878698.045 74.431434
1 Bucket1 4.0 25747.397 2.180971
2 Bucket2 6.0 171683.523 14.54271
3 Bucket2 7.0 55659.448 4.714716
4 Bucket3 8.0 1589.759 0.134662'''
df = pd.read_csv(StringIO(df_str), delim_whitespace=True)
fig, (ax1, ax2) = plt.subplots(ncols=2, figsize=(14, 5))
sns.lineplot(x="BUCKET_SEGMENT_1", y='Percentage to Bucket Total', data=df, ax=ax1)
sns.lineplot(x="PAST_DUE_DAYS", y='Percentage to Bucket Total', data=df, ax=ax2)
ax2.set_xticks(df["PAST_DUE_DAYS"])
plt.tight_layout()
plt.show()
The left subplot uses the categorical bucket names for the x-axis. All percentages are averaged, and an error band is shown. Bucket1
, Bucket2
and Bucket3
, are at internal positions 0
, 1
and 2
.
The right subplot uses the numerical PAST_DUE_DAYS
positions for the x-axis. There is only one percentage value per day, so no error band is needed.
ax.twinx()
will share the x axes. This will mix the first subplot positions (0,1,2) with the numerical positions of the second subplot (3,4,6,7,8). The result is the weird plot you see (clearly from a different dataframe as the example).
Now, to mark the buckets into the numerical plot, you could use groupby()
to find out the positions. With the minimum and maximum days, a text could be positioned. Also, alternating colored bands could visualize the ranges.
Here is some example code to get you started:
fig, ax = plt.subplots(figsize=(10, 5))
sns.lineplot(x="PAST_DUE_DAYS", y='Percentage to Bucket Total', data=df, ax=ax)
colors = ['red', 'green']
previous_max_days = 0
for (group_name, group_data), color in zip(df.groupby("BUCKET_SEGMENT_1"), colors * len(df)):
# min_days = group_data['PAST_DUE_DAYS'].min() - 0.5
min_days = previous_max_days
max_days = group_data['PAST_DUE_DAYS'].max() 0.5
ax.axvspan(min_days, max_days, color=color, alpha=0.1)
ax.text((min_days max_days) / 2, 0.9, group_name, ha='center', fontsize=20, color=color,
transform=ax.get_xaxis_transform())
previous_max_days = max_days
plt.show()