I was using this code to plot all data in my df:
m_cols = ['is_canceled','lead_time', 'arrival_date_year','arrival_date_week_number','arrival_date_day_of_month','stays_in_weekend_nights','adults','children','babies','is_repeated_guest','previous_cancellations','previous_bookings_not_canceled','booking_changes','agent','total_of_special_requests']
for col in num_cols:
sns.boxplot(y=df['is_canceled'].astype('category'),x=col,data=df)
plt.show()
But I got a few plots that look like this, how can I fix it?
CodePudding user response:
The boxplots seem to show that the large majority of values is zero, and the rest are shown as outliers. So e.g. previous_annulations is usually zero, a few have some specif value. All outliers with the same value are drawn on top of each other. Note that the "box" of a