Home > Software design >  How to put two Pandas box plots next to each other? Or group them by variable?
How to put two Pandas box plots next to each other? Or group them by variable?

Time:12-30

I have two data frames (df1 and df2). Each have the same 10 variables with different values.

I created box plots of the variables in the data frames like so:

df1.boxplot()
df2.boxplot()

I get two graphs of 10 box plots next to each other for each variable. The actual output is the second graph, however, as obviously Python just runs the code in order.

Instead, I would either like these box plots to appear side by side OR ideally, I would like 10 graphs (one for each variable) comparing each variable by data frame (e.g. one graph for the first variable with two box plots in it, one for each data frame). Is that possible just using python library or do I have to use Matplotlib?

Thanks!

CodePudding user response:

To get graphs, standard Python isn't enough. You'd need a graphical library such as matplotlib. Seaborn extends matplotlib to ease the creation of complex statistical plots. To work with Seaborn, the dataframes should be converted to long form (e.g. via pandas' melt) and then combined into one large dataframe.

import matplotlib.pyplot as plt
import seaborn as sns
import pandas as pd
import numpy as np

# suppose df1 and df2 are dataframes, each with the same 10 columns
df1 = pd.DataFrame({i: np.random.randn(100).cumsum() for i in 'abcdefghij'})
df2 = pd.DataFrame({i: np.random.randn(150).cumsum() for i in 'abcdefghij'})

# pd.melt converts the dataframe to long form, pd.concat combines them
df = pd.concat({'df1': df1.melt(), 'df2': df2.melt()}, names=['source', 'old_index'])
# convert the source index to a column, and reset the old index
df = df.reset_index(level=0).reset_index(drop=True)

sns.boxplot(data=df, x='variable', y='value', hue='source', palette='turbo')

This creates boxes for each of the original columns, comparing the two dataframes:

seaborn boxplot from two wide dataframes

Optionally, you could create multiple subplots with that same information:

sns.catplot(data=df, kind='box', col='variable', y='value', x='source',
            palette='turbo', height=3, aspect=0.5, col_wrap=5)

sns.catplot to create separate subplots

By default, the y-axes are shared. You can disable the sharing via sharey=False. Here is an example, which also removes the repeated x axes and creates a common legend:

g = sns.catplot(data=df, kind='box', col='variable', y='value', x='source', hue='source', dodge=False,
            palette='Reds', height=3, aspect=0.5, col_wrap=5, sharey=False)
g.set(xlabel='', xticks=[]) # remove x labels and ticks
g.add_legend()

sns.catplot with non-shared y-axis, and x-axis converted to legend

PS: If you simply want to put two pandas boxplots next to each other, you can create a figure with two subplots, and pass the axes to pandas. (Note that pandas plotting is just an interface towards matplotlib.)

fig, (ax1, ax2) = plt.subplots(ncols=2, figsize=(15, 5))

df1.boxplot(ax=ax1)
ax1.set_title('df1')
df2.boxplot(ax=ax2)
ax2.set_title('df2')

plt.tight_layout()
plt.show()

two pandas boxplots next to each other

  • Related