Home > database >  Horizontal barplot with offset in seaborn
Horizontal barplot with offset in seaborn

Time:12-30

My dataset is like this, where the data points in each row or column are pandas objects. enter image description here

Here is the dataset: enter image description here

I have tried something like this and i can have a bar plots for each individual Pi (i=1...14) but not only is it a laborious, it does not look what I want:

fig, ax = plt.subplots(2, 3, figsize=(8,2))

sns.stripplot(data=df.loc[7]['92A11'].split(','), dodge=True, linewidth=1, ax=ax[0,0], color='black', jitter=False, orient='h')
sns.violinplot(data=df.loc[7]['92A11'].split(','),  ax=ax[0,0], color='orange', orient='h')

sns.stripplot(data=df.loc[7]['92B11'].split(','), dodge=True, linewidth=1, ax=ax[0,1], color='black', jitter=False, orient='h')
sns.violinplot(data=df.loc[7]['92B11'].split(','),  ax=ax[0,1], color='orange', orient='h')

sns.stripplot(data=df.loc[7]['82B11'].split(','), dodge=True, linewidth=1, ax=ax[0,2], color='black', jitter=False, orient='h')
sns.violinplot(data=df.loc[7]['82B11'].split(','),  ax=ax[0,2], color='orange', orient='h')

sns.stripplot(data=df.loc[6]['92A11'].split(','), dodge=True, linewidth=1, ax=ax[1,0], color='black', jitter=False, orient='h')
sns.violinplot(data=df.loc[6]['92A11'].split(','),  ax=ax[1,0], color='orange', orient='h')

sns.stripplot(data=df.loc[6]['92B11'].split(','), dodge=True, linewidth=1, ax=ax[1,1], color='black', jitter=False, orient='h')
sns.violinplot(data=df.loc[6]['92B11'].split(','),  ax=ax[1,1], color='orange', orient='h')

sns.stripplot(data=df.loc[6]['82B11'].split(','), dodge=True, linewidth=1, ax=ax[1,2], color='black', jitter=False, orient='h')
sns.violinplot(data=df.loc[6]['82B11'].split(','),  ax=ax[1,2], color='orange', orient='h')

ax[0,0].set_xlim(0,200)
ax[0,1].set_xlim(0,200)
ax[0,2].set_xlim(0,200)
ax[1,0].set_xlim(0,200)
ax[1,1].set_xlim(0,200)
ax[1,2].set_xlim(0,200)

ax[1,0].set_xlabel('92A11')
ax[1,1].set_xlabel('92A11')
ax[1,2].set_xlabel('92A11')

ax[0,0].set_ylabel('P8')
ax[1,0].set_ylabel('P7')
fig.tight_layout()

enter image description here

CodePudding user response:

Seaborn works easiest with its data in long form. pd.melt can convert the 3 individual columns into a variable-name and value column. pd.explode can convert lists in cells into multiple rows. As the current entries are strings, they can be converted to lists of numbers before calling .explode().

The following code supposes you want box plots, not bars. It also supposes that an entry with only '0' is supposed to stay empty.

import matplotlib.pyplot as plt
import seaborn as sns
import pandas as pd

df = pd.read_csv('raw_data.csv')
df = df.rename(columns={df.columns[0]: 'Location'}) # suppose the first column is named 'Location'

df_melted = df.melt(id_vars='Location', var_name='Dataset', value_name='Datapoint')
df_melted['Datapoint'] = df_melted['Datapoint'].map(
    lambda s: [np.nan] if s == '0' else [float(v) for v in s.split(',')])
df_exploded = df_melted.explode(column='Datapoint', ignore_index=True)
df_exploded['Datapoint'] = pd.to_numeric(df_exploded['Datapoint']) # make sure the column is numeric

g = sns.catplot(data=df_exploded, col='Dataset', kind='box', x='Datapoint', y='Location')
g.set(xlabel='', ylabel='') # remove x and y label

sns.catplot from wide dataframe

PS: df_merged looks like:

   Location Dataset                                          Datapoint
0        P1   92A11                                              [nan]
1        P2   92A11                                       [10.0, 15.0]
2        P3   92A11  [100.0, 40.0, 60.0, 50.0, 70.0, 50.0, 30.0, 40...
3        P4   92A11  [55.0, 40.0, 20.0, 15.0, 15.0, 15.0, 15.0, 10....
4        P5   92A11                                              [nan]
...

df_exploded has each value inside the list into a separate row:

   Location Dataset Datapoint
0        P1   92A11       NaN
1        P2   92A11      10.0
1        P2   92A11      15.0
2        P3   92A11     100.0
2        P3   92A11      40.0
...
  • Related