Objective:
I want to create a stack histogram of a PaperlessBilling
categorical feature (Telco Customer Churn dataset), display the Y axis as a percentage and display the churn distribution as the hue. But, the percentage is not from the accumulative calculation.
Here is what I expected if using R:
ggplot(Churn, aes(SeniorCitizen, fill = Churn))
geom_bar(position = "fill")
xlab("Senior Citizen status")
ylab("Percent")
scale_y_continuous(labels = scales::percent)
scale_x_discrete(labels = c("Non-Senior Citizens", "Senior Citizens"))
scale_fill_manual(name = "Churn Status", values = c("green2", "red1"), labels = c("No", "Yes"))
ggtitle("The Ratio of Churns by Senior Citizen status")
theme_classic()
theme(legend.position = "bottom",
plot.title = element_text(hjust = 0.5, size = 15))
Here is the output of above code (see that both of the category has total 100%):
Here is what I've done:
fig, axs = plt.subplots(figsize=(5, 5))
sns.histplot(
df,
x = "PaperlessBilling",
hue = "Churn",
multiple = "stack",
stat = "percent"
)
This is the output of above code:
CodePudding user response: