I am working on Abalone dataset and I am facing a weird problem when plotting a boxplot.
If a run:
plt.figure(figsize=(16,6))
plt.subplot(121)
sns.boxplot(data=df['rings'])
working perfectly!
If I filter the dataset by sex like this:
df_f = df[df['sex']=='F']
df_m = df[df['sex']=='M']
df_i = df[df['sex']=='I']
And i run:
plt.figure(figsize=(16,6))
plt.subplot(121)
sns.boxplot(data=df_m['rings'])
working perfectly!
But if I run the code above for df_f
and df_i
datasets I get an error:
KeyError: 0
There's no missing values, all values are int.
What I am missing here?
CodePudding user response:
If you want a box plot per value of a categorical column I suggest:
sns.boxplot(data=df, x='rings', y='sex')
CodePudding user response:
You seem to have encountered a bug. To find out more, it helps to add the full error trace, not just the last line. In this case, the last code inside seaborn is important. At line 447 in categorical.py, there is a test if np.isscalar(data[0])
with data = df_f['rings']
. As data now is a pandas Series, index position 0 is tested, but this index isn't in the selection.
To further investigate the problem, it helps to try to reproduce it with a minimal example:
import seaborn as sns
import pandas as pd
df = pd.DataFrame({'Sex': ['M', 'M', 'F', 'F'],
'Rings': [1, 2, 3, 4]})
df_m = df[df['Sex'] == 'M']
df_f = df[df['Sex'] == 'F']
sns.boxplot(data=df_f['Rings'])
This does reproduce the error.
A workaround is to only pass the values to the seaborn function:
sns.boxplot(data=df_f['Rings'].values)
Or to use the dataframe as dataand the column as
y`:
sns.boxplot(data=df_f, y='Rings')
As the bug is inside seaborn/categorical.py
, similar functions will run into the same problem.