I tried to drop value and plot the countplot but values are still there. What am I doing wrong?
df = df.drop(df[(df['market_segment'] == 'Undefined') & (df['market_segment'] == 'Aviation')].index)
plt.figure(figsize=(12,8))
sns.countplot(x='market_segment',data=df,hue='hotel')
plt.show()
CodePudding user response:
There are 2 reasons this may be happening.
Your first line where you are filtering is incorrect
Your "market_segment" column may be a categorical dtype. In a categorical dtype Series, values that are not observed in the data can be propagated into
seaborn
, so converting to an object or string dtype can remedy this issue.
df = (
df.loc[~df["market_segment"].isin(["Undefined", "Aviation"])]
.astype({"market_segment": str})
)
plt.figure(figsize=(12,8))
sns.countplot(x='market_segment',data=df,hue='hotel')
plt.show()
CodePudding user response:
The problem is that you're removing rows where market_segment
is both Undefined
and Aviation
. That, obviously, is nonsense logic.
Change your AND (&
) to OR (|
):
df = df.drop(df[(df['market_segment'] == 'Undefined') | (df['market_segment'] == 'Aviation')].index)
# ^ changed from & to |
That way, all rows will be dropped where market_segment
is either Undefined
or Aviation
. If it's one of those, it will be remove.